Introduction

Portal hypertension (PH) is a set of clinical syndromes caused by increased pressure in the portal venous system and is one of the primary consequences of chronic liver diseases (CLD), which can lead to the formation of extensive collateral circulation [1]. Clinical significant portal hypertension (CSPH) is defined as hepatic venous pressure gradient (HVPG) ≥ 10 mmHg, which could result in clinical complications of PH such as esophageal varices (EV), ascites, hepatic encephalopathy, and hepatorenal syndrome. Furthermore, severe portal hypertension (SPH) defined as HVPG ≥ 12 mmHg is a risk factor of variceal bleeding [2]. EV is the most important collateral circulation of PH and occurs in approximately 50% of cirrhotic patients, while variceal bleeding is associated with high mortality [3, 4]. Therefore, timely detection and accurate assessment are important in patients with PH and EV to ensure appropriate patient management.

HVPG and esophagogastroduodenoscopy (EGD) are currently considered the gold standards for evaluating PH and EV, respectively [5, 6]. However, measurement of the HVPG and EGD are invasive and potentially associated with complications, the application of the two types of detection methods is limited due to poor patient compliance [7]. In addition, the equipment used for HVPG measurement is demanding and requires professional technicians, so it is difficult to carry out routinely in clinical practice. Hence, alternative noninvasive techniques, with favorable diagnostic performance for evaluating PH and EV would be extremely attractive.

Elasticity imaging techniques including ultrasound elastography (USE) and magnetic resonance elastography (MRE) have been used to assess changes in spleen stiffness in various diseases [8]. Recent studies have shown that spleen stiffness is related to the progression of hepatic fibrosis, and in patients with hepatitis B/C infection, spleen stiffness is increased even though the liver stiffness is unchanged [9, 10]. Subsequent studies have demonstrated that spleen stiffness was positively correlated with HVPG and has good performance in predicting CSPH and EV in CLD patients [11, 12]. Other studies have indicated that although spleen stiffness is associated with PH, it is not sufficient to accurately assess the severity of PH [13]. Further studies have suggested that SSM could reliably rule out the presence of high-risk esophageal varices (HREV) in cirrhotic patients, independently of the etiology of cirrhosis [14, 15]. Therefore, the aim of this meta-analysis is to comprehensively assess the diagnostic performance of SSM for evaluating PH and EV in patients with CLD.

Materials and methods

This study was performed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses of Diagnostic Test Accuracy Studies (PRISMA-DTA) [16], and this review was registered in the International Prospective Register of Systematic Reviews (PROSPERO, http://www.crd.york.ac.uk/PROSPERO): CRD42019122407.

Literature search

To identify studies evaluating SSM for the diagnosis of CSPH, SPH, any EV, or HREV in CLD patients, a systematic literature search was performed in PubMed, Embase, and Web of Science up to 30 April 2020. The Medical Subject Headings (MeSH) terms and free-text words terms used were as follows: spleen stiffness, portal hypertension, esophageal varices, chronic liver diseases, elastography, and diagnosis. For a comprehensive search of potentially suitable studies, a manual search was carried out by screening references of eligible articles.

Selection criteria

Eligible studies were selected by two reviewers independently with disagreements resolved by consensus. The eligible studies were identified according to the following criteria. (1) The accuracy of SSM was evaluated for the diagnosis of CSPH, SPH, EV, or HREV in adults with CLD. (2) Portal pressure was evaluated using HVPG, and EGD was used as the reference standard for EV [17]. (3) Sufficient data was provided to calculate the true positive (TP), false positive (FP), true negative (TN), and false negative (FN) of SSM for detecting CSPH, SPH, EV, or HREV. (4) At least 30 patients were evaluated to obtain good reliability. (5) Full articles were available and written in English. Duplicate publication, animal studies, and ex vivo studies were excluded.

Data extraction and quality assessment

Two reviewers independently extracted data and evaluated the quality of the included studies, disagreements were resolved by consensus. The following data was retrieved: first author, publication year, location, study design, technique of SSM, proportion of successful SSM, gold standard, the number of patients, age, sex, body mass index (BMI), proportion of cirrhosis, etiology of CLD, Child–Pugh score, cutoff values. TP, FP, TN, and FN were extracted directly or calculated. We limited extraction of data only to a validation cohort when both training and validation cohorts are provided in the same study. The quality of the studies was assessed according to the Quality Assessment of Diagnostic Accuracy Studies 2 tool (QUADAS-2) [18].

Statistical analysis and data synthesis

Summary sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), positive predictive value (PPV), negative predictive value (NPV), and diagnostic odds ratio (DOR) with corresponding 95% confidence intervals (CI) were calculated using the bivariate random-effects model to examine the diagnostic accuracy of SSM. Afterwards, the hierarchical summary receiver operating characteristic (HSROC) curve and the area under the curve (AUC) were calculated. Heterogeneity was evaluated using the Cochrane Q-test and the Higgins inconsistency index (I2), with p < 0.05 or I2 > 50% suggested substantial heterogeneity [19, 20]. Sensitivity analysis was performed by restricting analysis to patients with chronic viral liver disease. Univariate meta-regression analysis and subgroup analysis were also utilized to explore possible sources of heterogeneity. The covariates included the following: (1) measurement technique (MRE vs. USE), (2) study location (European vs. Asian), (3) study design (prospective vs. retrospective or cross-sectional), (4) prevalence of diseases (≥ 50% vs. < 50%), (5) proportion of cirrhosis (total vs. mixed sample), (6) etiology of CLD (viral vs. mixed), (7) proportion of Child A (≥ 50% vs. < 50%), (8) success rate of SSM (≥ 90% vs. < 90%). Fagan plots were used to assess the clinical utility of SSM for diagnosing CSPH, SPH, EV, and HREV [21]. Publication bias was assessed by Deeks’ funnel plot, with a value of p < 0.1 for the slope coefficient suggesting significant asymmetry [22]. All of the above analyses were performed using “midas” and “metandi” modules of Stata version 13.0 (StataCorp).

Results

Search results and study characteristics

The flow chart summarizing the literature screening is illustrated in Fig. 1. A total of 379 initial articles were identified with the predefined search strategies; after 146 duplicates were removed, 165 irrelevant studies were further eliminated; 68 studies were left for further evaluation. Of these, 36 articles were excluded after full-text review for the following reasons: undesirable article types, not diagnostic accuracy study, not relevant to CLD, small sample size (fewer than 30 participants), insufficient data (TP, FP, TN, and FN not reported or could not be calculated), and not in English. Ultimately, 32 articles estimating the accuracy of SSM for the diagnosis of PH and/or EV were included [11, 13,14,15, 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50].

Fig. 1
figure 1

Flow chart of study selection process

According to different gold standards (HVPG and EGD), the detailed characteristics of the 32 studies were summarized in Tables 1 and 2, respectively. A total of 3952 patients with an average age of 58.8 were investigated. The 32 original articles included 15 prospective studies, 4 retrospective studies, and 13 cross-sectional studies. The results of quality assessment of the studies are shown in Fig. 2. Most studies were identified as low-risk for risk of bias and applicability concerns, with all of the studies satisfying four or more of the seven total domains (Supplementary Table 1).

Table 1 Characteristics of the studies evaluating the performance of spleen stiffness measurement (SSM) for the detection of portal hypertension
Table 2 Characteristics of the studies evaluating the performance of spleen stiffness measurement (SSM) for the detection of esophageal varices
Fig. 2
figure 2

Quality assessment of the included studies according to Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) criteria

Diagnostic accuracy of SSM for the detection of CSPH

The performance of SSM for the diagnosis of CSPH was evaluated in 7 studies. The pooled sensitivity and specificity of spleen stiffness for detecting CSPH were 0.85 (95% CI, 0.69–0.93) and 0.86 (95% CI, 0.74–0.93), respectively (Fig. 3a). The pooled PLR, NLR, and DOR were 5.95 (95% CI: 3.35–10.55), 0.18 (95% CI: 0.09–0.35), and 33.76 (95% CI, 16.72–68.16), respectively. Figure 4 a illustrates the HSROC curve with AUC of 0.92 (95% CI, 0.89–0.94).

Fig. 3
figure 3

Sensitivity and specificity forest plots of spleen stiffness measurement (SSM) for detecting CSPH (a), SPH (b), EV (c), and HREV (d)

Fig. 4
figure 4

Hierarchical summary receiver operating characteristic (HSROC) curve of spleen stiffness measurement (SSM) for detecting CSPH (a), SPH (b), EV (c), and HREV (d)

Diagnostic accuracy of SSM for the detection of SPH

The performance of SSM for the diagnosis of SPH was evaluated in 7 studies. The pooled sensitivity and specificity of SSM for detecting SPH were 0.84 (95% CI, 0.75–0.90) and 0.84 (95% CI, 0.72–0.91), respectively (Fig. 3b). The pooled PLR, NLR, and DOR were 5.17 (95% CI: 2.94–9.10), 0.19 (95% CI: 0.12–0.30), and 27.47 (95% CI, 12.79–59.00), respectively. Figure 4b illustrates the HSROC curve with AUC of 0.91 (95% CI, 0.88–0.93).

Diagnostic accuracy of SSM for the detection of any EV

The diagnostic accuracy of SSM for EV was evaluated in 20 studies. The pooled sensitivity and specificity of SSM for detecting CSPH were 0.90 (95% CI, 0.83–0.94) and 0.73 (95% CI, 0.66–0.79), respectively (Fig. 3c). The pooled PLR, NLR, and DOR were 3.34 (95% CI: 2.63–4.24), 0.14 (95% CI: 0.08–0.23), and 23.84 (95% CI, 12.70–44.74), respectively. Figure 4c illustrates the HSROC curve with AUC of 0.87 (95% CI, 0.84–0.90). On restricting analysis to 8 studies performed in pure chronic viral liver disease, the pooled sensitivity and specificity was 0.85 (95% CI, 0.72–0.92) and 0.76 (95% CI, 0.67–0.84), with an AUC of 0.86 (95% CI, 0.83–0.89). The sensitivity analysis did not significantly increase the diagnostic performance of SSM.

Diagnostic accuracy of SSM for the detection of HREV

The diagnostic accuracy of SSM for HREV was evaluated in 17 studies. HREV were variably defined in the included studies (Table 2). The pooled sensitivity and specificity of SSM for detecting HREV were 0.87 (95% CI, 0.77–0.93) and 0.66 (95% CI, 0.53–0.77), respectively (Fig. 4c). The pooled PLR, NLR, and DOR were 2.56 (95% CI: 1.76–3.72), 0.20 (95% CI: 0.10–0.38), and 13.01 (95% CI, 5.19–32.64), respectively. Figure 4d illustrates the HSROC curve with AUC of 0.83 (95% CI, 0.79–0.86). On the basis of these values, and assuming a 29.9% HREV (as observed in the included studies), the pooled PPV and NPV were 0.54 (95% CI: 0.47–0.62) and 0.88 (95% CI: 0.81–0.95), respectively. Considering the pooled NPV and the prevalence of HREV in the included studies, a total of 50.6% (95% CI, 43.4–59.0%) patients would avoid endoscopies with a risk of missing HREV of 8.4% (95% CI, 4.1–17.2%) in patients with the “negative” results of SSM, and 4.7% (95% CI, 2.3–9.4%) among the overall population of 2214 patients evaluated (Table 3).

Table 3 Summary diagnostic accuracy and the post-test probabilities of spleen stiffness measurement (SSM) for CSPH, SPH, EV, and HREV

Significant heterogeneity among studies was observed in DOR (p < 0.001). The Deeks’ plot showed that there was no potential publication bias for the studies (p = 0.60, 0.95, 0.15, 0.14) (Supplementary Fig. 1).

Results of meta-regression and subgroup analysis

Univariate meta-regressions showed that the types of elastography technique, study location, study design, prevalence of diseases, etiology of CLD, proportion of Child A, and success rate of SSM were associated with the heterogeneity. SSM showed better performance for the diagnosis of any EV in Asian populations than in European populations. In addition, compared with the studies having a success rate of SSM < 90%, studies with a success rate ≥ 90% had a lower specificity for the diagnosis of any EV. The details of subgroup analysis are demonstrated in Table 4.

Table 4 Results of subgroup analysis of spleen stiffness measurement (SSM) for the diagnosis of CSPH, SPH, EV, and HREV

Clinical utility of SSM for detecting CSPH, SPH, EV, and HREV

The Fagan plot analysis indicated that when pre-test probability was 50%, SSM was very informative with an 86% probability of correctly detecting CSPH following a “positive” measurement and lowering the probability of disease to 15% when “negative” measurement; and the probability of correctly diagnosing SPH following a “positive” measurement reached 84%. However, the probability of a correct diagnosis rate did not exceed 80% for diagnosing any EV and HREV when the pre-test probability was 50% (Table 3).

Discussion

The results of this meta-analysis indicated that spleen stiffness measured by current techniques had a fairly good accuracy for the detection of PH and EV in CLD patients. AUCs for the diagnosis of CSPH and SPH exceeded 90%, and AUCs for diagnosis of any EV and HREV reached 87% and 83%, respectively. SSM was able to predict the presence of CSPH with good sensitivity and specificity (85% and 86%, respectively). Notably, we observed that the pooled sensitivity and NPV of SSM for detecting HREV were fairly good, and was 0.87 (95% CI, 0.77–0.93) and 0.88 (95% CI, 0.81–0.95), respectively, which suggested that HREV could be ruled out in most CLD patients evaluated by SSM, thereby avoiding unnecessary endoscopy.

PH results in progressive splenomegaly and remodeled spleen, which, due to passive congestion, increased arterial blood flow and fibrogenesis that may enhance spleen stiffness, lending support to the physiological feasibility of SSM for detecting PH and EV [51, 52]. Previous studies have confirmed that USE showed good diagnostic performance for significant liver fibrosis and liver cirrhosis [53, 54]. MRE is a newly developed method to quantitatively evaluate the elasticity of living tissue that provides full-field-of-view elastograms of the abdomen with excellent diagnostic accuracy for staging hepatic fibrosis [55, 56]. Studies have demonstrated that MRE-based spleen stiffness is strongly associated with the presence of EV, and with the cutoff value of 7.23 kPa, SSM showed good performance for detecting EV in cirrhosis patients, with an AUC of 0.83 (95% CI, 0.76–0.89) [33, 38]. In the past several years, MRE-based spleen stiffness has been suggested as a valid parameter to identify the presence of EV [57].

The prevalence of varices needing treatment (VNT) is very low in patients with compensated cirrhosis [58]. Previous studies suggest that liver stiffness measurement (LSM) plus platelet count can be used to exclude the presence of HREV in patients with Child–Pugh A cirrhosis [59]. However, the performance of LSM alone in predicting PH is controversial due to lack of consistent results, which may be due to the reason that it is affected by confounding factors, such as hepatocyte inflammation and cholestasis, and it only reflects the increase of intrahepatic resistance to portal blood flow, while is unable to account for dynamic changes of the splanchnic blood flow [8]. In a meta-analysis focusing on the diagnostic performance of LSM, the DOR for evaluating any EV and HREV was 7.54 (95% CI, 4.46–12.73) and 8.85 (95% CI, 5.93–13.19), respectively [60]. In our meta-analysis, the comparable DOR of SSM were 21.92 (95% CI, 11.53-41.68) and 16.07 (95% CI, 7.15-36.14), respectively. The results show that the diagnostic accuracy of SSM for detecting EV was significantly better than that of LSM. Considering the pooled NPV (0.88) and the prevalence of HREV observed in the included studies (29.9%), a total of 1120 (50.6%) patients would avoid endoscopies with a risk of missing HREV of 4.7% among the overall 2214 patients evaluated. As compared with the Expanded-Baveno VI criteria, SSM would spare more unnecessary endoscopies (50.6% vs. 40.0%); however, the number of HREV missed increased as well (4.7% vs. 1.6%) [61]. The increase of missed diagnosis rate may be due to the prevalence rate of HREV, which is significantly greater in our meta-analysis than in the cohort of the Expanded-Baveno VI criteria (29.9% vs. 9.9%), and the NPV is affected by the prevalence of disease. When the prevalence rate is high, the NPV is relatively low, resulting in an increased rate of missed diagnosis. Accordingly, our meta-analysis demonstrated that SSM was useful for ruling out the presence of HREV in CLD patients, and a new model combined with SSM and other noninvasive criteria would probably safely avoid more endoscopies [62].

Considerable heterogeneity was observed in our study and a meta-regression analysis was performed to identify probable causes. We observed that the diagnostic performance of SSM for detecting any EV was better across Asian populations than in European populations. Previous studies have shown that BMI and central obesity are independent influencing factors for the failure and unreliability of USE [63]. The mean BMI of the subjects from European was higher (range: 23.0–27.0 kg/m2) than that of Asian subjects (range: 20.8–24.6 kg/m2). In addition, compared with the studies with a success rate of SSM < 90%, the studies with a success rate ≥ 90% had a lower specificity for detecting any EV. This may be due to the thickness of spleen, which may have affected the success rate of SSM, and when the thickness of the spleen was less than 4 cm, the success rate of SSM was low. Furthermore, the prevalence of EV increases with the degree of splenomegaly, which would lead to a decrease in the specificity of the detection.

The main strength of our study is that we comprehensively evaluated the diagnostic accuracy of spleen stiffness, measured by different techniques including USE and MRE, across variety of populations and chronic liver disease. Therefore, the result of our meta-analysis would reflect the diagnostic performance of SSM for detecting PH and EV in a real world. In addition, we separately assessed the diagnostic accuracy of SSM in detecting CSPH, SPH, any EV, and HREV, in order to evaluate the clinical application value of SSM comprehensively.

There were several limitations in this study. First, a considerable amount of heterogeneity was detected across the included studies, attributable to the types of elastography technique, study location, study design, the prevalence of disease, and several other covariates which were unrecorded in the included studies. Second, the number of eligible studies was relatively low, with only 3 studies having assessed MRE, and some relatively small samples of studies were included in our meta-analysis. In the future, large-sample and multicenter studies are needed for more comprehensive evaluation. In addition, our meta-analysis included only studies written in English, putting the results at risk of language bias. Considering these limitations, caution must be taken when interpreting the results of our study.

In conclusion, SSM was a promising method to detecting PH and EV with good diagnostic accuracy and it would be a helpful noninvasive surveillance tool for clinicians in management CLD patients. In addition, SSM could rule out the presence of HREV in most CLD patients and would be used as an initial screening method thereby avoiding unnecessary endoscopy. Future, prospective studies with larger sample size and in diverse clinical settings are required to further assess the effectiveness of SSM.