Diagnostic accuracy of spleen stiffness to evaluate portal hypertension and esophageal varices in chronic liver disease: a systematic review and meta-analysis

Objectives To systematically review studies on the diagnostic accuracy of spleen stiffness measurement (SSM) for the detection of clinical significant portal hypertension (CSPH), severe portal hypertension (SPH), esophageal varices (EV), and high-risk esophageal varices (HREV) in patients with chronic liver diseases (CLD). Methods Through a systematic search, we identified 32 studies reporting the accuracy of SSM for the diagnosis of portal hypertension (PH) and/or EV in adults with CLD. A bivariate random-effects model was performed to estimate pooled sensitivity, specificity, likelihood ratio, positive predictive value (PPV), negative predictive value (NPV), and diagnostic odds ratios (DOR). The clinical utility of SSM was evaluated by Fagan plot. Results A total of 32 studies assessing 3952 patients were included in this meta-analysis. The pooled sensitivities of SSM were 0.85 (95% confidence interval (CI), 0.69–0.93) for CSPH; 0.84 (95% CI, 0.75–0.90) for SPH; 0.90 (95% CI, 0.83–0.94) for any EV; and 0.87 (95% CI, 0.77–0.93) for HREV. The pooled specificities of SSM were 0.86 (95% CI, 0.74–0.93) for CSPH; 0.84 (95% CI, 0.72–0.91) for SPH; 0.73 (95% CI, 0.66–0.79) for EV; and 0.66 (95% CI, 0.53–0.77) for HREV. Summary PPV and NPV of SSM for detecting HREV were 0.54 (95% CI, 0.47–0.62) and 0.88 (95% CI, 0.81–0.95), respectively. Conclusions Our meta-analysis suggests that SSM could be used as a helpful surveillance tool in management of CLD patients and was quite useful for ruling out the presence of HREV thereby avoiding unnecessary endoscopy. Key Points • SSM could be used to rule out the presence of HREV in patients with CLD thereby avoiding unnecessary endoscopy. • SSM has significant diagnostic value for CSPH and SPH with high sensitivity and specificity in patients with CLD. • SSM could be used as a helpful surveillance tool for clinicians managing CLD patients. Electronic supplementary material The online version of this article (10.1007/s00330-020-07223-8) contains supplementary material, which is available to authorized users.


Introduction
Portal hypertension (PH) is a set of clinical syndromes caused by increased pressure in the portal venous system and is one of the primary consequences of chronic liver diseases (CLD), which can lead to the formation of extensive collateral circulation [1]. Clinical significant portal hypertension (CSPH) is defined as hepatic venous pressure gradient (HVPG) ≥ 10 mmHg, which could result in clinical complications of PH such as esophageal varices (EV), ascites, hepatic encephalopathy, and hepatorenal syndrome. Furthermore, severe portal hypertension (SPH) defined as HVPG ≥ 12 mmHg is a risk factor of variceal bleeding [2]. EV is the most important collateral circulation of PH and occurs in approximately 50% of cirrhotic patients, while variceal bleeding is associated with high mortality [3,4]. Therefore, timely detection and accurate assessment are important in patients with PH and EV to ensure appropriate patient management.
HVPG and esophagogastroduodenoscopy (EGD) are currently considered the gold standards for evaluating PH and EV, respectively [5,6]. However, measurement of the HVPG and EGD are invasive and potentially associated with complications, the application of the two types of detection methods is limited due to poor patient compliance [7]. In addition, the equipment used for HVPG measurement is demanding and requires professional technicians, so it is difficult to carry out routinely in clinical practice. Hence, alternative noninvasive techniques, with favorable diagnostic performance for evaluating PH and EV would be extremely attractive.
Elasticity imaging techniques including ultrasound elastography (USE) and magnetic resonance elastography (MRE) have been used to assess changes in spleen stiffness in various diseases [8]. Recent studies have shown that spleen stiffness is related to the progression of hepatic fibrosis, and in patients with hepatitis B/C infection, spleen stiffness is increased even though the liver stiffness is unchanged [9,10]. Subsequent studies have demonstrated that spleen stiffness was positively correlated with HVPG and has good performance in predicting CSPH and EV in CLD patients [11,12]. Other studies have indicated that although spleen stiffness is associated with PH, it is not sufficient to accurately assess the severity of PH [13]. Further studies have suggested that SSM could reliably rule out the presence of high-risk esophageal varices (HREV) in cirrhotic patients, independently of the etiology of cirrhosis [14,15]. Therefore, the aim of this meta-analysis is to comprehensively assess the diagnostic performance of SSM for evaluating PH and EV in patients with CLD.

Materials and methods
This study was performed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses of Diagnostic Test Accuracy Studies (PRISMA-DTA) [16], and this review was registered in the International Prospective Register of Systematic Reviews (PROSPERO, http://www. crd.york.ac.uk/PROSPERO): CRD42019122407.

Literature search
To identify studies evaluating SSM for the diagnosis of CSPH, SPH, any EV, or HREV in CLD patients, a systematic literature search was performed in PubMed, Embase, and Web of Science up to 30 April 2020. The Medical Subject Headings (MeSH) terms and free-text words terms used were as follows: spleen stiffness, portal hypertension, esophageal varices, chronic liver diseases, elastography, and diagnosis. For a comprehensive search of potentially suitable studies, a manual search was carried out by screening references of eligible articles.

Selection criteria
Eligible studies were selected by two reviewers independently with disagreements resolved by consensus. The eligible studies were identified according to the following criteria. (1) The accuracy of SSM was evaluated for the diagnosis of CSPH, SPH, EV, or HREV in adults with CLD. (2) Portal pressure was evaluated using HVPG, and EGD was used as the reference standard for EV [17]. (3) Sufficient data was provided to calculate the true positive (TP), false positive (FP), true negative (TN), and false negative (FN) of SSM for detecting CSPH, SPH, EV, or HREV. (4) At least 30 patients were evaluated to obtain good reliability. (5) Full articles were available and written in English. Duplicate publication, animal studies, and ex vivo studies were excluded.

Data extraction and quality assessment
Two reviewers independently extracted data and evaluated the quality of the included studies, disagreements were resolved by consensus. The following data was retrieved: first author, publication year, location, study design, technique of SSM, proportion of successful SSM, gold standard, the number of patients, age, sex, body mass index (BMI), proportion of cirrhosis, etiology of CLD, Child-Pugh score, cutoff values. TP, FP, TN, and FN were extracted directly or calculated. We limited extraction of data only to a validation cohort when both training and validation cohorts are provided in the same study. The quality of the studies was assessed according to the Quality Assessment of Diagnostic Accuracy Studies 2 tool (QUADAS-2) [18].

Statistical analysis and data synthesis
Summary sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), positive predictive value (PPV), negative predictive value (NPV), and diagnostic odds ratio (DOR) with corresponding 95% confidence intervals (CI) were calculated using the bivariate random-effects model to examine the diagnostic accuracy of SSM. Afterwards, the hierarchical summary receiver operating characteristic (HSROC) curve and the area under the curve (AUC) were calculated. Heterogeneity was evaluated using the Cochrane Q-test and the Higgins inconsistency index (I 2 ), with p < 0.05 or I 2 > 50% suggested substantial heterogeneity [19,20]. Sensitivity analysis was performed by restricting analysis to patients with chronic viral liver disease. Univariate meta-regression analysis and subgroup analysis were also utilized to explore possible sources of heterogeneity. The covariates included the following: (1) measurement technique (MRE vs. USE), (2) study location (European vs. Asian), (3) study design (prospective vs. retrospective or cross-sectional), (4) prevalence of diseases (≥ 50% vs. < 50%), (5) proportion of cirrhosis (total vs. mixed sample), (6) etiology of CLD (viral vs. mixed), (7) proportion of Child A (≥ 50% vs. < 50%), (8) success rate of SSM (≥ 90% vs. < 90%). Fagan plots were used to assess the clinical utility of SSM for diagnosing CSPH, SPH, EV, and HREV [21]. Publication bias was assessed by Deeks' funnel plot, with a value of p < 0.1 for the slope coefficient suggesting significant asymmetry [22]. All of the above analyses were performed using "midas" and "metandi" modules of Stata version 13.0 (StataCorp).

Search results and study characteristics
The flow chart summarizing the literature screening is illustrated in Fig. 1. A total of 379 initial articles were identified with the predefined search strategies; after 146 duplicates were removed, 165 irrelevant studies were further eliminated; 68 studies were left for further evaluation. Of these, 36 articles were excluded after full-text review for the following reasons: undesirable article types, not diagnostic accuracy study, not relevant to CLD, small sample size (fewer than 30 participants), insufficient data (TP, FP, TN, and FN not reported or could not be calculated), and not in English. Ultimately, 32 articles estimating the accuracy of SSM for the diagnosis of PH and/or EV were included [11,[13][14][15].
According to different gold standards (HVPG and EGD), the detailed characteristics of the 32 studies were summarized in Tables 1 and 2, respectively. A total of 3952 patients with an average age of 58.8 were investigated. The 32 original articles included 15 prospective studies, 4 retrospective studies, and 13 cross-sectional studies. The results of quality assessment of the studies are shown in Fig. 2. Most studies were identified as low-risk for risk of bias and applicability concerns, with all of the studies satisfying four or more of the seven total domains (Supplementary Table 1).

Diagnostic accuracy of SSM for the detection of CSPH
The performance of SSM for the diagnosis of CSPH was evaluated in 7 studies. The pooled sensitivity and specificity of spleen stiffness for detecting CSPH were 0.85 (95% CI, 0.69-0.93) and 0.86 (95% CI, 0.74-0.93), respectively (Fig. 3a)

Diagnostic accuracy of SSM for the detection of HREV
The diagnostic accuracy of SSM for HREV was evaluated in 17 studies. HREV were variably defined in the included studies (  (Table 3). Significant heterogeneity among studies was observed in DOR (p < 0.001). The Deeks' plot showed that there was no potential publication bias for the studies (p = 0.60, 0.95, 0.15, 0.14) (Supplementary Fig. 1).

Results of meta-regression and subgroup analysis
Univariate meta-regressions showed that the types of elastography technique, study location, study design, prevalence of diseases, etiology of CLD, proportion of Child A, and success rate of SSM were associated with the heterogeneity. SSM showed better performance for the diagnosis of any EV in Asian populations than in European  probability of correctly detecting CSPH following a "positive" measurement and lowering the probability of disease to 15% when "negative" measurement; and the probability of correctly diagnosing SPH following a "positive" measurement reached 84%. However, the probability of a correct diagnosis rate did not exceed 80% for diagnosing any EV and HREV when the pre-test probability was 50% (Table 3).

Discussion
The results of this meta-analysis indicated that spleen stiffness measured by current techniques had a fairly good accuracy for the detection of PH and EV in CLD patients. AUCs for the diagnosis of CSPH and SPH exceeded 90%, and AUCs for diagnosis of any EV and HREV reached 87% and 83%, respectively. SSM was able to predict the presence of CSPH with good sensitivity and specificity (85% and 86%, respectively). Notably, we observed that the pooled sensitivity and NPV of SSM for detecting HREV were fairly good, and was 0.87 (95% CI, 0.77-0.93) and 0.88 (95% CI, 0.81-0.95), respectively, which suggested that HREV could be ruled out in most CLD patients evaluated by SSM, thereby avoiding unnecessary endoscopy. PH results in progressive splenomegaly and remodeled spleen, which, due to passive congestion, increased arterial blood flow and fibrogenesis that may enhance spleen stiffness, lending support to the physiological feasibility of SSM for detecting PH and EV [51,52]. Previous studies have confirmed that USE showed good diagnostic performance for significant liver fibrosis and liver cirrhosis [53,54]. MRE is a newly developed method to quantitatively evaluate the elasticity of living tissue that provides full-field-of-view elastograms of the abdomen with excellent diagnostic accuracy for staging hepatic fibrosis [55,56]. Studies have demonstrated that MRE-based spleen stiffness is strongly associated with the presence of EV, and with the cutoff value of 7.23 kPa, SSM showed good performance for detecting EV in cirrhosis patients, with an AUC of 0.83 (95% CI, 0.76-0.89) [33,38]. In the past several years, MRE-based spleen stiffness has been suggested as a valid parameter to identify the presence of EV [57].
The prevalence of varices needing treatment (VNT) is very low in patients with compensated cirrhosis [58]. Previous studies suggest that liver stiffness measurement (LSM) plus platelet count can be used to exclude the presence of HREV in patients with Child-Pugh A cirrhosis [59]. However, the performance of LSM alone in predicting PH is controversial due to lack of consistent results, which may be due to the reason that it is affected by confounding factors, such as hepatocyte inflammation and cholestasis, and it only reflects the increase of intrahepatic resistance to portal blood flow, while is unable to account for dynamic changes of the splanchnic blood flow *There were significant differences between two subgroups (p < 0.05) **There were significant differences between two subgroups (p < 0.01) ***There were significant differences between two subgroups (p < 0.001) [8].  [61]. The increase of missed diagnosis rate may be due to the prevalence rate of HREV, which is significantly greater in our meta-analysis than in the cohort of the Expanded-Baveno VI criteria (29.9% vs. 9.9%), and the NPV is affected by the prevalence of disease. When the prevalence rate is high, the NPV is relatively low, resulting in an increased rate of missed diagnosis. Accordingly, our meta-analysis demonstrated that SSM was useful for ruling out the presence of HREV in CLD patients, and a new model combined with SSM and other noninvasive criteria would probably safely avoid more endoscopies [62]. Considerable heterogeneity was observed in our study and a meta-regression analysis was performed to identify probable causes. We observed that the diagnostic performance of SSM for detecting any EV was better across Asian populations than in European populations. Previous studies have shown that BMI and central obesity are independent influencing factors for the failure and unreliability of USE [63]. The mean BMI of the subjects from European was higher (range: 23.0-27.0 kg/m 2 ) than that of Asian subjects (range: 20.8-24.6 kg/m 2 ). In addition, compared with the studies with a success rate of SSM < 90%, the studies with a success rate ≥ 90% had a lower specificity for detecting any EV. This may be due to the thickness of spleen, which may have affected the success rate of SSM, and when the thickness of the spleen was less than 4 cm, the success rate of SSM was low. Furthermore, the prevalence of EV increases with the degree of splenomegaly, which would lead to a decrease in the specificity of the detection.
The main strength of our study is that we comprehensively evaluated the diagnostic accuracy of spleen stiffness, measured by different techniques including USE and MRE, across variety of populations and chronic liver disease. Therefore, the result of our meta-analysis would reflect the diagnostic performance of SSM for detecting PH and EV in a real world. In addition, we separately assessed the diagnostic accuracy of SSM in detecting CSPH, SPH, any EV, and HREV, in order to evaluate the clinical application value of SSM comprehensively.
There were several limitations in this study. First, a considerable amount of heterogeneity was detected across the included studies, attributable to the types of elastography technique, study location, study design, the prevalence of disease, and several other covariates which were unrecorded in the included studies. Second, the number of eligible studies was relatively low, with only 3 studies having assessed MRE, and some relatively small samples of studies were included in our meta-analysis. In the future, large-sample and multicenter studies are needed for more comprehensive evaluation. In addition, our meta-analysis included only studies written in English, putting the results at risk of language bias. Considering these limitations, caution must be taken when interpreting the results of our study.
In conclusion, SSM was a promising method to detecting PH and EV with good diagnostic accuracy and it would be a helpful noninvasive surveillance tool for clinicians in management CLD patients. In addition, SSM could rule out the presence of HREV in most CLD patients and would be used as an initial screening method thereby avoiding unnecessary endoscopy. Future, prospective studies with larger sample size and in diverse clinical settings are required to further assess the effectiveness of SSM.

Compliance with ethical standards
Guarantor The scientific guarantor of this publication is Xing Hu, MD.

Conflict of interest
The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article.
Statistics and biometry One of the authors, Professor Jianhua Hou, has significant statistical expertise.
Informed consent Written informed consent was not required for this study because this study was a meta-analysis.
Ethical approval Institutional review board approval was not required because this study was a meta-analysis.

Methodology
• Diagnostic accuracy test • Systematic review • Meta-analysis Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.