Introduction

Acute cholecystitis (AC) is one of the most common diseases in emergency departments (EDs), occurring in 3–10% of patients with acute abdominal pain [1]. It generally results from cystic duct obstruction by a gallstone, followed by inflammation of the gallbladder (GB) [2].

Diagnostic imaging modalities for AC include ultrasound (US), computed tomography (CT), or hepatobiliary iminodiacetic acid (HIDA) scan [3]. A 2012 meta-analysis reported that the HIDA scan had the highest diagnostic accuracy for AC [4]. However, US has non-radiating, easily accessible, and inexpensive characteristics, becoming the first-line diagnostic tool in emergency settings. In recent years, there has been a significant increase in the number of publications regarding the use of US for the diagnosis of AC. Up-to-date evidence is still lacking. Further, US is performed by radiologists traditionally. It is unclear whether the diagnostic performance differs when performed by other sonographers such as emergency physicians (EPs) or surgeons.

Hence, we aim to perform a meta-analysis to investigate the diagnostic performance of US for AC.

Methods

This meta-analysis adhered to the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) Statement [5]. The meta-analysis protocol was registered in PROSPERO (CRD42023425075). The ethical committee review was waived at the study institution.

Search strategy and study selection

To identify relevant articles for our study, we conducted a comprehensive search in three databases: MEDLINE, Embase, and Cochrane Library. The search included articles published before August 2023, without any language restrictions. We employed the search strategy combining the keywords "bedside US", "emergency US" or "point-of-care US" with "AC". Two reviewers (SSH and KWL) independently screened the titles and abstracts of the retrieved articles to identify suitable studies. The inclusion criteria encompassed articles investigating the diagnostic performance of US for AC. We excluded case reports, case series, editorials, and review articles from our search strategy, as well as studies focused on acalculous cholecystitis. The complete literature search strategy is available in Additional file 5: Table S1.

Data extraction and quality assessment

The quality of the included studies was evaluated by two independent reviewers (SSH and KWL) using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool [6]. Any discrepancies between the reviewers were resolved through discussion involving a third author (WCL).

Data synthesis and analysis

We extracted and summarized data from each study into 2 × 2 contingency tables to perform sensitivity and specificity analysis. To mitigate bias in the presence of zero observations in false-positive or false-negative results, we applied a continuity correction of 0.5. Summary estimates of sensitivity, specificity, predictive values, likelihood ratios, and accuracy along with their 95% confidence intervals (CIs) were calculated using a bivariate random-effects model with restricted maximum likelihood estimation for diagnostic meta-analysis [7]. The forest plot was used to visually represent the pooled summary estimates and their 95% CIs.

Additionally, we performed a subgroup analysis to assess the diagnostic performance of US among different sonographers, namely EPs, surgeons, and radiologists. Furthermore, we conducted a separate subgroup analysis to investigate the diagnostic accuracy of various sonographic findings in diagnosing AC.

To measure heterogeneity between the included studies, we utilized the inconsistency index I2. Additionally, we assessed publication bias using Deek’s test [8]. Statistical significance was defined as a p value < 0.05. All analyses were conducted using R software version 4.3.0 (R Foundation for Statistical Computing, Vienna, Australia).

Results

Figure 1 depicts a flowchart that outlines the inclusion and exclusion process. A total of 1309 studies were identified through MEDLINE, Embase, Cochrane Library, and manual searches of the reference list of the included articles. After the initial screening and removal of duplicates, 60 studies were left for full-text article review. Among them, 20 studies were excluded during the full-text review as they did not present relevant findings on the topic or report the diagnostic accuracy of US. Consequently, 40 studies were included for data extraction and meta-analysis [9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48]. We also generated a summary receiver operating characteristics (SROC) curve to assess the performance of US in detecting AC (Additional file 1: Fig. S1).

Fig. 1
figure 1

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagram

Quality of the included studies

Figure 2 shows the risk of bias and applicability of the included studies. In the risk of bias assessment, the majority of studies, except for patient selection, had a low risk of bias across four domains. However, more than 25% of the studies were identified as having a high risk of bias due to their utilization of a case–control study design and their non-consecutive or non-random enrollment of sample patients. Turning to applicability, most studies received a low-risk score in the reference standard and index test domains. Nevertheless, some studies were deemed to have a high risk of bias in terms of applicability because they exclusively focused on post-cholecystectomy patients instead of a more diverse and representative patient cohort.

Fig. 2
figure 2

The summary of the quality assessment results of the included studies

Diagnostic performance of US

Across all 40 studies, a total of 8652 patients were included, with an average age of 45.9 years and a male gender composition of 34%. Detailed information about the included studies can be found in Table 1. The overall sensitivity was 71% (95% CI, 69–72%), while the specificity was 85% (95% CI, 84–86%) (Table 2, Fig. 3). The positive likelihood ratio (PLR) was 4.80 (95% CI, 3.33–6.78), and the negative likelihood ratio (NLR) was 0.33 (95% CI, 0.25–0.41). The accuracy was 0.83 (95% CI, 0.82–0.83), demonstrating good diagnostic performance. Heterogeneity among the studies was high (I2 = 89.7%; 95% CI, 87–92%), which could be due to varying patient enrollment criteria and the presence of potential confounders due to non-randomized assignment in the included studies. No significant publication bias was detected through Deek’s test (p value = 0.39).

Table 1 Detailed information on the included studies
Table 2 The pooled estimates of diagnostic performance of ultrasound for acute cholecystitis
Fig. 3
figure 3

The forest plot of diagnostic performance of ultrasound for the diagnosis of acute cholecystitis

Subgroup analysis for sonographers

The subgroup analysis included 14 studies involving EPs, 3 studies involving surgeons, and 18 studies involving radiologists. Two studies compared the performance between EPs and radiologists, and one compared EPs with surgeons, while 8 did not provide detailed information on the sonographers.

The pooled sensitivity and specificity of US were 71% (95% CI, 67–74%) and 92% (95% CI, 90–93%) performed by EPs, 79% (95% CI, 71–85%) and 76% (95% CI, 69–81%) performed by surgeons, and 68% (95% CI 66–71%) and 87% (95% CI, 86–88%) performed by radiologists, respectively (Additional files 2, 3 and 4: Figs. S2, S3 and S4 and Table 2). There were no statistically significant differences in sensitivity, specificity, PLR, NLR, and accuracy among the three groups.

Subgroup analysis of sonographic findings

The sonographic findings and their relationship to the diagnosis of AC are summarized in Table 3. Notably, not all of the included studies provided detailed information regarding individual sonographic findings.

Table 3 The pooled estimates of the sonographic finding for the diagnosis of acute cholecystitis

Discussion

We performed a systematic review and meta-analysis to investigate the diagnostic performance of US for AC. Forty studies with a total of 8,652 patients were included. To the best of our knowledge, this is the largest meta-analysis currently, providing updated evidence.

Our results revealed US had a sensitivity of 71%, a specificity of 85%, and an accuracy of 0.83, indicative of good discriminability. Also, the sensitivity and specificity were similar among those performed by EPs, surgeons, and radiologists. Further, the presence of gallstones had a higher sensitivity for AC. However, most of the studies used combinations of sonographic findings for the diagnosis of AC.

Clinical symptoms and signs of AC had varying sensitivity and specificity [49]. The Tokyo guidelines suggest using imaging studies such as US, CT, and HIDA scans for the diagnosis of AC, in conjunction with detailed history, complete clinical examination, and laboratory tests [3]. Although HIDA has excellent diagnostic performance for AC with a sensitivity and specificity above 90% [4], its utilization is limited in emergency practice due to the required resources, time, and exposure to radioactive isotopes [50]. By contrast, US is a valuable tool for its non-ionizing, low-cost, and easy-to-use characteristics. US is considered the first-line imaging modality in recently published guidelines for the diagnosis of AC [3, 50]. Our review provides the evidence that US is a good diagnostic tool with discriminative power.

The American College of Emergency Physicians states that US is an essential skill in emergency practice, and GB-US is included in 12 core applications [51, 52]. It also indicates that 25 sonographic examinations of GB should be performed as a minimum requirement for training and accreditation [51]. In recent years, US has broadly used and increased integration into emergency practice. There were also a rising number of studies regarding the EP-performed US.

In our review, the diagnostic performance was similar between EPs and radiologists. Half of the 14 studies that EPs performed US reported the training background [10, 12, 13, 23, 27, 29, 34]; however, the level of training could range from novices (the first-year residents) to attendings [10]. Summers et al. [29] reported an intraclass correlation coefficient of 0 (95% CI, 0–0.13), suggestive of similar performance at different levels. Although the inter-rater reliability was not thoroughly evaluated in the majority of the studies regarding the EP-performed US, EPs could achieve proficiency using US as a part of physical examination for the assessment of GB diseases [26].

Moreover, US also demonstrates time efficiency in several studies [9, 21, 25]. The mean time interval between the surgeon-performed US and the surgery was significantly lower than that between the radiologist-performed and surgery (2.3 vs. 11.9 h) [9]. Similar results were observed between those receiving radiologist-performed US and HIDA scans [21, 25]. However, evidence regarding the effect of EP-performed US in the fastening clinical management process or patient-centered outcomes (length of stay and mortality) of patients with AC is still lacking.

In our review, the presence of gallstones exhibited optimal performance for the diagnosis of AC. However, most of the included studies used the combination of the presence of gallstones with at least one additional inflammatory sign such as GB wall thickness, peri-GB fluid, and sonographic Murphy sign. Moreover, there have been reported refinements in the use of US to evaluate patients with right upper quadrant pain and suspected AC. Wertz et al. [20] reported the transverse dimension of the GB more than 4 cm was found in 59% of their 60 patients with AC. Perez et al. [15] found that a cystic artery velocity of more than 40 cm/s had a high specificity of 94% for AC. However, the results were still inconclusive and needed further investigation.

This study has several limitations. First, a high risk of bias and applicability concerns in patient selection existed in more than one-fourth of the studies, limiting the generalizability. However, our study is by far the most comprehensive systematic review regarding US for the diagnosis of AC. Second, the majority of studies were conducted in Western countries. The results would be extrapolated cautiously to Asian patients. Third, the details of comorbidities and body mass indexes were lacking across the studies; thus, factors associated with false-negative and false-positive cases could not be thoroughly analyzed. Fourth, acalculous cholecystitis accounts for approximately 10% of patients with AC [53, 54]. However, AC was diagnosed in this review using criteria for the presence of gallstones. The extrapolation of the results should be cautioned for patients with acalculous cholecystitis. Last, patients have to fast for at least 6 h before US for a better illustration of the GB. However, most studies did not provide information on whether the patients were fasting or not. Also, ED patients would visit after a big meal. The diagnostic performance of US would be influenced by non-fasting patients.

Conclusion

US is a good imaging modality for the diagnosis of AC with discriminative power. EP-performed US has a similar diagnostic performance to those by radiologists. Further investigations would be needed for the impact of US on the clinical management process and patient-centered outcomes.