Introduction

Acute appendicitis is a common surgical emergency in adults, with a worldwide incidence estimated to be between 100 and 206 cases per 100,000 person-years [1]. Urgent appendectomy has been the traditional treatment approach for decades, with over 95% of cases managed surgically [2, 3]. However, nonoperative management (NOM) with antibiotic therapy has recently emerged as an alternative treatment strategy for uncomplicated appendicitis (i.e., those without gangrene or perforation), offering several benefits [4]. NOM failure and recurrent appendicitis are reported in 12–39% of patients [5,6,7,8], which is a concern. Therefore, careful patient selection and monitoring is crucial when considering NOM as a treatment option for adult appendicitis.

An appendicolith, a calcific material within the appendix seen on imaging studies [9, 10], has consistently been associated with complicated appendicitis in clinical, imaging, and pathological studies [5, 11,12,13,14]. Appendicoliths have been identified as independent predictors of failed NOM and recurrent appendicitis in patients with uncomplicated disease [15,16,17,18]. As a result, the presence of appendicolith in adult appendicitis may exclude patients from NOM [19, 20]. Recently, the World Society of Emergency Surgery [21] issued a guideline cautioning against nonsurgical treatment of appendicolith appendicitis. However, not all cases of appendicolith appendicitis are complicated, with appendicoliths present in 13.8–23.0% of those with acute uncomplicated appendicitis [17, 22]. Other features of appendicoliths, such as diameter and location, have been suggested in the study of Ishiyama et al as predictors of complication [23]. However, it remains uncertain if appendicolith characteristics could be used as selection criteria allowing NOM in a subset of patients with apparently uncomplicated appendicitis. Therefore, our study aimed to investigate the association between appendicoliths and complicated appendicitis and to identify differences in appendicolith characteristics between patients with complicated and uncomplicated appendicitis. Another aim was to explore the accuracy of portovenous-phase CT alone in the detection of appendicoliths compared to the combination of noncontrast and portovenous phases.

Materials and methods

Study design and patient selection

This retrospective cross-sectional investigation was performed at a tertiary-care urban academic hospital, which has a capacity of 2200 beds. The hospital’s Institutional Review Board approved the study (protocol no. SIRB 198/2564 (IRB1)) and waived the requirement for informed consent due to its retrospective nature. Consecutive adult patients who underwent appendectomy with pathologically confirmed diagnosis of acute appendicitis and available preoperative CT were included. Patients were excluded if they had no clinical data available (n = 8), had CT performed without intravenous contrast (n = 1), or if the appendix was not identified on CT (n = 1). Note that while our previous investigation of different objective and endpoint [24] included a subset of 201 patients from this cohort, this investigation analyzed all 324 patients with available preoperative CT, which met the sample size calculated initially based on prevalence of appendicoliths of at least 30% with 95% confidence level and 5% allowable error. The flowchart of patient inclusion is provided in Fig. 1.

Fig. 1
figure 1

Flowchart of patient inclusion

Clinical data, image acquisition, reinterpretation, and definitions

Demographic data, time points (among symptoms, CT, and treatments), hospital length of stay, signs and symptoms, laboratory data, Alvarado score, type of appendectomy, and operative and pathological results were collected from the electronic medical records. CT scans were performed on one of our three multidetector scanners (64-slice LightSpeed VCT, 64-slice Discovery CT750 HD, or 256-slice Revolution CT, all from GE Healthcare). The scan coverage included from either the top of hemidiaphragms or kidneys to the pubic symphysis. They were performed without intravenous contrast (“noncontrast” phase), followed by administration of nonionic contrast medium at a rate of 2 mL/s, volume of 100 mL or 2 mL/kg via injectors, and a scan delay of 70–80 s (“portovenous” phase). The scan parameters were as follows: 120 kVp and 300 mAs for 64-MDCT or 250 mAs for 256-MDCT, respectively. Oral and rectal contrast media were not administered. Images of 1.25-mm slice thickness in both the noncontrast and portovenous phases were sent to Picture Archiving and Communication Systems for viewing. Two radiologists (one emergency and another abdominal subspecialists, both with 20 years of experience) independently reviewed the portovenous phase first for presence of appendicoliths. If an appendicolith was absent in this phase, the noncontrast phase was then evaluated. When an appendicolith was present, the number, signs of obstruction, and location were assessed. All discrepancies were resolved by a consensus. The detailed CT appearances of appendicitis in patients with appendicoliths shown only on the noncontrast phase but not the portovenous phase were re-reviewed using the same method. The measurements of appendicoliths and its surrounding tissues for size and CT numbers were performed on noncontrast-phase images by an emergency radiologist with a 20-year experience (example in Fig. 2). The measurements of appendix diameter were performed on axial portovenous phase CT images. The definitions of CT findings and measurements are provided in Supplementary Material 1.

Fig. 2
figure 2

Measurements of appendicoliths (1), perimeter of air abutting appendicoliths (2), and soft tissue to the side of appendicoliths (3) using PACS tool. A, area (cm.2); L, length (cm); Len Max, maximum length (cm); Len Min, minimum length (cm); M, mean Hounsfield unit (HU); Max, maximum HU; Min, minimum HU; P, perimeter (cm); SD, standard deviation (HU)

Categorization of appendicoliths by machine learning

Axial CT images in a Digital Imaging and Communications in Medicine format were selected at the mid-point of each appendicolith and set to a specific window level and window width of 60 and 225 Hounsfield units, respectively. A Portable Network Graphics image of each appendicolith was obtained and then manually segmented. Because the largest appendicolith did not exceed 36 pixels, the 36 × 36 pixel cropped images centered on the appendicolith were obtained. Any smaller images were resized to 36 × 36 pixels.

We adopted the semantic clustering by adopting nearest neighbors model [25] because, at the time of writing, it was one of the state-of-the-art models on unsupervised image classification and image clustering (https://paperswithcode.com). This model consists of the following steps:

  1. 1.

    Representation learning for semantic clustering, which was similar to contrastive learning.

  2. 2.

    A semantic clustering loss. This technique trained a classifier model with the goal of assigning instances neighboring each other to the same class while maximizing entropy, which kept other clusters away in the representation space. Since the number of appropriate clusters was needed to be determined beforehand, we used the elbow method and silhouette method [26] with K-means clustering on the representation space obtained in step 1. We also applied principal component analysis [27]. This resulted in “2” as the most appropriate number of clusters in this investigation.

  3. 3.

    Fine-tuning through self-labeling.

To improve the performance of medical image classification, especially on a small dataset, we also utilized transfer learning [28].

By applying this methodology, two categories or types of appendicoliths were identified, as presented in Fig. 3, which we referred to as type 0 (homogeneous and rounded appendicoliths) and type 1 (heterogeneous appendicoliths with central or peripheral hypoattenuation, and oval). Subsequently, “representative” appendicoliths of all patients were independently classified by a 2nd-year radiology resident and a 1st-year radiology resident based on this categorization, and any discrepancies were resolved by an emergency radiologist with a 20-year experience. The results (“type of appendicolith”) were then used as a part of CT characteristics of appendicoliths.

Fig. 3
figure 3

Type of appendicoliths suggested by machine learning algorithm

Reference standards

In this study, noncontrast-phase CT was regarded as the diagnostic reference for appendicoliths, which were defined as hyperattenuating foci with a diameter > 2 mm located either inside the appendiceal lumen or outside in fluid or fluid collection [10, 12]. The diagnosis of acute appendicitis was based on histopathological results, while complicated appendicitis, including gangrene and perforation, was diagnosed by either histopathology or surgical operative findings [29]. Histopathology was used to diagnose gangrene, while either histopathology or surgical operative findings were employed to diagnose perforation.

Statistical analysis

Descriptive statistics were used to analyze both qualitative and quantitative data. Categorical variables were presented as numbers or percentages, while continuous data were reported as either mean (standard deviation) or median (range) depending on their normal or skewed distribution.

Inferential statistics were employed to compare the differences between the two groups (patients without vs. with appendicolith, complicated vs. uncomplicated appendicitis, and appendicoliths not shown vs. shown on portovenous phase). The Pearson chi-square test, Yates continuity correction, or Fisher exact test was utilized for categorical variables, and the independent-sample t-test or Mann–Whitney U test was used for continuous variables with means or medians, respectively. Logistic regression approach or marginal logistic regression using the generalized estimating equations (GEE) model in order to account for correlation between appendicoliths in the same patient was applied for univariable and multivariable analyses to determine the independent predictors of the presence of appendicolith or non-detection of appendicoliths in the portovenous phase computed tomography. The odds ratio (OR) and adjusted OR (aOR) with corresponding 95% confidence intervals (95% CI) were used to identify the strength and direction of their association. The selection of factors into the multivariable model was based on a p value of less than 0.1 in a univariable model. Cutoff values of potential continuous variables that may predict complicated appendicitis in patients with appendicolith were derived. The area under the receiver operating characteristics (ROC) curve (AUC) with the corresponding 95% confidence interval (95% CI) was considered as a summary measure for discrimination.

The diagnostic performance of portovenous-phase CT in diagnosing appendicoliths was evaluated using sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, positive predictive value, negative predictive value, and accuracy. All analyses were performed using the IBM SPSS Statistics for Windows Version 26.0 and considering a statistical significance of a p value less than 0.05.

Results

The study included 324 patients with a median age of 54 years (range, 18–94) and a female predominance (65.4%). The median Alvarado score was 7 (range, 1–10), and the mean appendix diameter measured on CT was 12 mm (SD, 2.9). Of the 324 patients, 127 (39.2%) had complicated appendicitis. Table 1 provides information and comparison between patients with and without appendicoliths. Univariable and multivariable analyses (Table 2) identified 5 independent factors associated with appendicoliths, including a shorter duration from emergency room arrival to antibiotics (aOR = 0.926; 95% CI: 0.865–0.991), rebound tenderness (aOR = 2.067; 95% CI: 1.258–3.398), larger appendix diameter on CT (aOR = 1.140; 95% CI: 1.039–1.250), longer length of stay for initial admission (aOR = 1.124; 95% CI: 1.007–1.253), and presence of complicated appendicitis (aOR = 2.289; 95% CI: 1.343–3.902).

Table 1 Patient characteristics between those with and without appendicoliths (n = 324)
Table 2 Multivariable analysis of factors associated with presence of appendicoliths (n = 324)

A total of 134 patients had at least one appendicolith, with 75 patients having complicated appendicitis and 59 having uncomplicated appendicitis. Except for the minimum diameter of appendicoliths, other CT characteristics (including type, number, presence of obstruction, location, size, and CT attenuation; Table 3) showed no significant differences between the appendicoliths found in patients with complicated vs. uncomplicated appendicitis. The areas under the ROC curve of the minimum and maximum diameters of appendicolith were 0.607 (95% CI; 0.510–0.704) and 0.566 (95% CI; 0.466–0.666), with p values of 0.03 and 0.19, respectively. The optimal cutoff value of the minimum diameter was identified at 4.5 mm (p = 0.03), which yielded 62 true positives, 38 false positives, 13 false negatives, and 21 true negatives. The sensitivity, specificity, positive predictive value, and negative predictive value with their respective 95% CI were 82.7% (72.6–89.6%), 35.6% (24.6–48.3%), 62.0% (52.2–70.9%), and 61.8% (45.0–76.1%), respectively. The optimal cutoff value of the maximum diameter was identified at 6.0 mm (p = 0.02), which yielded 64 true positives, 39 false positives, 11 false negatives, and 20 true negatives. The sensitivity, specificity, positive predictive value, and negative predictive value with their respective 95% CI were 85.3% (75.6–91.6%), 33.9% (23.1–46.6%), 62.1% (52.5–70.9%), and 64.5% (46.9–78.9%), respectively.

Table 3 Computed tomographic characteristics of appendicoliths in patients with complicated and uncomplicated appendicitis (n = 134)*

A total of 237 appendicoliths were found in these 134 patients. Sensitivities in the detection of appendicolith on the portovenous-phase CT were 88.2% per appendicolith and 82.1% per patient. There were 28 false negatives (per appendicolith) and 24 (per patient) (Table 4).

Table 4 Diagnostic performance of portovenous phase computed tomography in the detection of appendicoliths using combined noncontrast and portovenous phases as a reference standard

Univariable and multivariable analyses (Supplementary Material 2 and Table 5) revealed four factors associated with false-negative results on the portovenous-phase CT. These included appendicoliths with a homogenous appearance (aOR = 6.803; 95% CI: 1.202–38.462), smaller minimum diameter (aOR = 0.034; 95% CI: 0.002–0.591), smaller differences between maximum and minimum CT attenuation (aOR = 0.994; 95% CI: 0.990–0.999), and smaller differences between CT attenuation of appendicolith and surrounding soft tissues (aOR = 0.966; 95% CI: 0.943–0.989).

Table 5 Multivariable analysis of factors associated with non-detection of appendicolith in the portovenous phase computed tomography of adult patients with acute appendicitis (n = 237)

Twenty-four patients had 28 appendicoliths not detected on the portovenous phase (Fig. 4, Supplementary Material 3). Among these 24 patients, 16 had complicated appendicitis as confirmed by histopathology or surgical operative findings. CT correctly identified complications in 14 patients, while 5 were correctly identified as not having complications. There were 2 false negatives and 3 false positives during the re-review of CT images.

Fig. 4
figure 4

Example of two cases with appendicoliths (arrows) undetected on portovenous-phase CT (A, C) but visible on noncontrast CT (B, D)

Discussion

This investigation highlights the importance of identifying appendicoliths in adult appendicitis due to their association with complications, like gangrene or perforation, as well as significantly longer hospital stays. The only CT characteristic of appendicolith significantly associated with increased likelihood of complicated appendicitis was the larger minimum diameter of appendicoliths. The sensitivities of portovenous-phase CT in detecting appendicoliths were 82.1% per patient and 88.2% per appendicolith. Unsurprisingly, these undetected appendicoliths had a small minimum diameter and a homogeneous appearance, and were less distinguishable from surrounding soft tissues. These overlooked appendicoliths, however, resulted in misclassification of acute appendicitis in only 1.6% of patients because other CT findings of complication were present in the portovenous-phase CT.

Association of appendicoliths with complicated appendicitis

The presence of appendicoliths has consistently been associated with complicated appendicitis in several studies. These studies suggest that appendicolith appendicitis has a higher severity of inflammation in pathological specimens than that without appendicoliths [14], a higher severity of inflammation on imaging [10], a higher prevalence of perforation [30], and an increased rate of failed NOM and complications [5, 11, 12, 18, 22]. Our investigation supports this notion by revealing that patients with appendicolith appendicitis were approximately twice as likely to have complicated appendicitis at pathology. Furthermore, we associated appendicoliths with markers of increased severity, such as rebound tenderness, a larger appendix diameter on CT, and a longer length of hospital stay, which are in line with previous studies [12, 22].

Although appendicoliths were found to be strongly associated with complicated appendicitis, several investigations reported a prevalence of appendicoliths in pathologically proven uncomplicated appendicitis of 13.8 to 23.0% [17, 22]. Our investigation showed a higher prevalence of 44% (59 out of 134 patients). Previous studies have suggested that appendicolith size [12, 23, 31] and location [23, 32] are associated with complications, and our investigation found that only the minimum diameter of appendicoliths was independently associated with complicated appendicitis. It is unclear why the minimum diameter, instead of the maximum diameter, is a predictor of complicated appendicitis. However, cutoff values (4.5 mm for the minimum diameter and 6.0 mm for the maximum diameter of appendicoliths) could be established for both diameters with reasonable AUCs and relatively high sensitivities but poor specificities. Previous investigations [12, 23, 31] identified a maximum diameter as an independent predictor of complicated appendicitis. A 5-mm cutoff value was reported as useful for suggesting nonoperative management [12] or prediction of complicated appendicitis [23, 31], while a 10-mm cutoff value was suggested as a cutoff for appendectomy [12]. As these investigations did not typically collect the minimum diameter, it is difficult to determine whether it would be a more accurate representation of the appendicolith’s diameter.

Diagnostic performance of portovenous-phase CT in the detection of appendicolith

The accurate noninvasive detection of appendicoliths in adults with appendicitis becomes important for determining whether NOM is a feasible option in an otherwise uncomplicated case. The diagnostic performance of contrast-enhanced CT in the detection of appendicoliths had been explored in detail in a few investigations that used surgical specimens and histopathology as a reference standard [30, 33]. These investigations identified a wide range of sensitivity (21–81%; overall 56%), specificity (78–96%; overall 86%), and radiologists’ agreement (kappa; 0.48–0.83) [33]. However, we believe that the poor CT performance in the detection of appendicolith in these reports stemmed from the radiologic definition of appendicoliths as a “calcific” deposit [9, 10], which categorically excludes noncalcific intraluminal contents within an appendix from being classified as appendicolith. This leads to a lower sensitivity of CT in this regard. In addition, using histopathology as a reference standard has limitations, including difficulties in differentiating between a fecalith and a calcific counterpart (i.e., appendicolith as per CT definition), variances in evaluating appendiceal specimen vs in situ CT, and loss of appendiceal contents during transfer of the specimen [14].

To address these limitations and facilitate practical prospective patient management, we utilized a combined noncontrast and portovenous-phase CT as a reference for identifying appendicoliths in our study, recognizing its imperfections. We used a prespecified standard definition that had thresholds for both size (> 2 mm; to allow accurate and reproducible detection) and CT attenuation (visibly higher than surrounding tissue) [9, 10] to reduce bias. Using CT as a reference also reflects real-world application as this is the case for many trials [5,6,7,8]. However, this approach comes at a cost of not knowing the implication of “noncalcified” contents within the inflamed appendix.

Previous studies have reported a prevalence of appendicoliths in adult appendicitis ranging from 33 to 38.7% [10, 34, 35]. However, our study found a higher prevalence at 41.4%. This difference may be attributed to our use of a broader definition of appendicoliths as high-attenuation materials rather than strict calcifications, and the utilization of combined noncontrast and portovenous-phase CT instead of a single portovenous-phase CT, which is typically used in previous investigations. Noncontrast CT is known to be superior to portovenous-phase CT in detecting calcifications, as demonstrated in studies on urolithiasis, cholelithiasis, and choledocholithiasis [36, 37]. Calcifications stand out more obviously relative to surrounding soft tissues in the noncontrast phase than in the portovenous phase. Our multivariable analysis of factors associated with appendicoliths not detected on portovenous-phase CT revealed that their CT attenuation was significantly closer to that of surrounding soft tissues. Additionally, they had a more homogeneous appearance (both visually and by measurement of differences between maximum and minimum CT attenuation), and a smaller minimum diameter.

Using this combined noncontrast and portovenous-phase CT, we identified a higher prevalence of appendicoliths, with 11.8% more appendicoliths per appendicolith and 17.9% more appendicoliths per patient compared to using portovenous-phase CT alone. Strict adherence to the World Society of Emergency Surgery guideline would result in a higher number of appendicitis cases being excluded from NOM due to the presence of appendicoliths. It is important to acknowledge that implementing this guideline may lead to some patients with uncomplicated appendicitis being ineligible for NOM. This is to ensure a safe practice of recommending appendectomy for patients with uncomplicated appendicitis (with appendicolith) rather than resorting to NOM for those with complicated appendicitis. In fact, even when the portovenous-phase CT alone was used to select patient for NOM, the overwhelming majority of patients with proven complicated appendicitis would still be correctly identified due to the presence of CT findings other than appendicoliths. In our cohort, only two out of 127 patients (1.6%) with complicated appendicitis would be misdiagnosed as having uncomplicated appendicitis on CT. Our investigation still supports the use of portovenous-phase CT, even when used alone without noncontrast phase, as it can still identify almost all cases of complicated appendicitis through findings other than appendicoliths such as fluid collections and extraluminal air.

The study has several limitations. Firstly, it was a retrospective single-center study with a relatively small sample size (although it did reach a precalculated level). Additionally, a large proportion of our patients consisted of the elder population, which may explain the high rate of complication (almost 40%) detected in this investigation [38]. Secondly, many patients with appendicitis were excluded due to the use of alternative diagnostic methods, such as preoperative ultrasound or outside-hospital CT, or because they directly underwent surgery. These may affect the proportion of patients with and without complication, and limit the generalizability of the findings to other populations or settings. Since appendectomy remains the standard of care for appendicitis in our clinical practice, we believe that the likelihood of excluding uncomplicated appendicitis being treated with NOM is minimal. Thirdly, not all potential clinical confounders were collected, which limits our ability to confidently conclude on the association between certain clinical parameters and the presence of appendicoliths. Fourthly, while the study suggested that the minimum diameter of appendicolith was independently associated with complicated appendicitis, further studies are needed to confirm this finding. Fifthly, since the primary aim of NOM was to ensure that patients with complicated appendicitis were not mistakenly selected for NOM, identifying detailed appendicolith characteristics for their potential selection for NOM may be counterproductive. Lastly, while the study found that noncontrast-phase CT identifies more appendicoliths than the portovenous phase alone, the added value of identifying appendicolith may be limited by other CT findings that can direct patients to a complicated group (i.e., for appendectomy).

In conclusion, our study found a significant association between appendicoliths and complicated appendicitis—in particular those with a larger minimum diameter. While the sensitivity of portovenous-phase CT in detecting appendicoliths was modest compared to combined noncontrast and portovenous-phase CT, the portovenous phase alone was sufficient in accurately identifying complicated appendicitis through the presence of other CT findings. Consequently, the routine use of combined noncontrast and portovenous-phase scans cannot be recommended. However, in older patients with suspected acute appendicitis under consideration of NOM, the inclusion of a noncontrast phase may be justified. Detecting appendicoliths in this patient subset can provide an objective means to diagnose complicated appendicitis, which is associated with increased morbidity and mortality as age advances [38]. Furthermore, the lower risk of radiation-related cancer in older patients [39] likely outweighs the potential benefits of characterizing complicated appendicitis, making the inclusion of a noncontrast phase more justifiable for this age group. While our results provided insight into the selection of patients for NOM, further research is needed to validate the significance of appendicolith size in predicting complicated appendicitis. New techniques like dual-energy CT may offer direct findings of complicated appendicitis but the potential of virtual noncontrast images as a substitute for true noncontrast images, reducing radiation exposure, remains uncertain.