Introduction

Acute appendicitis is common in patients visiting the emergency department (ED). The lifetime risk of developing acute appendicitis is approximately 7–10% [1]. Ultrasound (US) and computed tomography (CT) are widely used in the diagnostic work-up of ED patients clinically suspected of having acute appendicitis and the use of imaging has increased substantially in these patients over the last few decades [24]. This increase is supported by good accuracy values reported in the literature. A recent meta-analysis of head-to-head comparisons of US and CT in patients with suspected appendicitis reported summary sensitivity estimates of 78% for US and 91% for CT [5].

Although a more widespread use of imaging has led to a decline in negative appendectomy rates, the accuracy of imaging can be improved even further [3, 6]. The accuracy of US and CT is influenced by the radiologists who evaluate diagnostic imaging of the acute abdomen in daily practice. These radiologists arrive at their imaging diagnosis based on a set of imaging features described in the literature as criteria for the diagnosis appendicitis, but without exact knowledge of the diagnostic accuracy of specific combinations of such features. These features consist of an increased appendiceal diameter, appendiceal wall thickening, peri-appendiceal fat infiltration, peri-appendiceal free fluid and the presence of an appendicolith. For US, one can add local transducer tenderness and non-compressibility of the appendix and surrounding fat. If all these features are present the diagnosis of appendicitis can be made easily at imaging. In most cases, however, the diagnosis is less clear.

So far, the research reported in the literature has focused on the evaluation of the accuracy of single features for detecting appendicitis in patients who are clinically suspected of having appendicitis at the ED [2, 7]. In practice, the features do not occur in isolation, and it makes sense to evaluate a combination of features. Knowing the accuracy of such combinations of imaging features may further improve the value of imaging in patients with suspected appendicitis.

The purpose of this study was twofold: first, to evaluate, in isolation, US and CT imaging features presumably associated with appendicitis in unselected patients presenting with acute abdominal pain at the emergency department; second, to identify profiles of US and CT features that can help in detecting appendicitis with high diagnostic accuracy. In addition, we compared the actual weights of these imaging features with the weights assigned to these features by the radiologists evaluating the CT and US images.

Materials and methods

Patients

We invited consecutive patients with acute abdominal pain for more than 2 h and less than 5 days who presented at the emergency department (ED) in two university and four (large) teaching hospitals [8, 9] to participate in this study. Patients discharged from the ED by the treating physician without any diagnostic imaging (plain radiographs, US or CT), patients under 18 years, pregnant women, patients with a blunt or penetrating trauma as well as patients in haemorrhagic shock caused by a gastrointestinal bleeding or acute abdominal aneurysm were not eligible for this study. Eligible patients were asked to provide written informed consent.

Because the study presented within this paper is a sub-analysis of a study which evaluates the additional value of imaging on top of clinical evaluation in patients presenting with acute abdominal pain at the emergency department [9], all consenting patients underwent a standardised diagnostic protocol, consisting of clinical evaluation, plain supine abdominal and upright chest radiography, abdominal ultrasound and abdominal CT. This study had been approved by the Institutional Review Boards of participating hospitals.

Image evaluation

All patients underwent ultrasound and computed tomography within a few hours of presentation at the ED. US and CT were independently evaluated by two different observers blinded to all other imaging findings obtained during the diagnostic work-up. After hours, when only one radiologist or radiological resident was present, US and CT were usually evaluated by the same observer. In these cases, the CT examination was re-evaluated the next morning by a radiologist, blinded to the results of the US evaluation of the same patient and all other patient data obtained in the diagnostic work-up.

Ultrasound

A general abdominal survey was performed with US. To standardise the US examination, the results were recorded on a digital case record form; the following potential appendiceal abnormalities on imaging were evaluated: could the appendix be completely visualised (meaning visualised from the base to the tip of the appendix), was there local transducer tenderness, a thickened appendix (diameter greater than 6 mm), a compressible appendix, an appendicolith, an intact layered wall structure, fat infiltration adjacent to the appendix or was there free fluid adjacent to the appendix? All observers recorded their US diagnosis, and if applicable, two differential diagnoses.

Computed tomography

CT protocols for the different CT systems in this multi-centre study were based on the following: effective mAs 165, 120 kV, (4×) 2.5-mm collimation, (4×) 3-mm slice width and 0.5-s rotation time, and 125 ml intravenous iodinated contrast at 3 ml/s after a 60-s delay; no oral or rectal contrast agents were used. Only patients with known renal failure underwent un-enhanced CT.

CT images were evaluated in the same standardised manner as the US examinations, and characteristics were assessed and recorded on a digital case record form. Potential appendiceal abnormalities on CT were: incomplete visualisation of the appendix (e.g. tip of the appendix not visualised), thickening of the appendix (diameter greater than 6 mm), visualisation of an appendicolith, increased appendiceal enhancement, fat infiltration adjacent to the appendix and free fluid adjacent to the appendix. A final CT diagnosis and, if applicable, two differential diagnoses were recorded.

Final diagnosis

An independent expert panel assigned a final diagnosis after 6 months [8, 9]. This expert panel consisted of two experienced gastrointestinal surgeons and an experienced abdominal radiologist, none of whom had been involved in the work-up or management of included patients. The panel members evaluated all available data for each patient, including follow-up of at least 6 months. If there was no consensus on diagnoses after individual evaluation, consensus was reached in a group discussion. The final diagnosis was set after at least 6 months based on all available clinical, laboratory, imaging, surgical, pathological and outcome data. For definite appendicitis the final diagnosis was mostly based on initially obtained surgery and histopathology reports.

Analysis

The final diagnosis was used as the reference standard in estimating the accuracy of imaging features, both for US and for CT. For each imaging feature we calculated the corresponding diagnostic odds ratio using univariate logistic regression analysis. An odds ratio of 1 indicates no association, with higher ratios pointing to stronger associations.

Multivariate logistic regression analysis was used to evaluate the diagnostic accuracy association of a combination of features, conditional on the presence or absence of other features and their contribution. All features with a significant odds ratio in the first univariate analysis were included in this multivariate analysis. We then used a backward elimination strategy, removing variables with a negligible contribution to the multivariate model, based on the final diagnosis, to arrive at the most parsimonious model. We will refer to the features in the final US and the final CT model as essential imaging features. Both for inclusion and exclusion the significance level was set at 0.05.

We also used logistic regression analysis to evaluate the weights implicitly assigned to the imaging features by the radiologists in their imaging diagnosis. Here, not the final diagnosis, but an imaging diagnosis of appendicitis was used as the outcome variable. By comparing the relative weights we were able to evaluate whether specific features were overvalued or undervalued by the radiologists. Differences between the weight assigned by the radiologist and the actual weight were evaluated and tested for significance using the z-test statistic.

Based on the sets of essential features in the final US and CT models, we developed imaging profiles. These imaging profiles were defined as combinations of the essential imaging features that are either absent or present. For each profile, we counted the number of patients with that profile and the proportion of patients with that profile who had a final diagnosis of appendicitis. All analyses were performed using SPSS 15.0.1 statistics (SPSS Inc. Chicago, IL, USA).

Results

Patients

Between March 2005 and November 2006, 1,101 patients were included. The data for 80 patients had to be excluded from the analysis because of incomplete case record forms. Another 79 patients were excluded because they had already undergone appendectomy. The mean age of the remaining 942 patients was 47 years (range 19–94) and 515 (55%) of the patients were female. A total of 284 (30%) had a final diagnosis of acute appendicitis. In 271 out of 284 patients appendicitis had been proven histopathologically after appendectomy. Of the remaining 13 patients, 12 patients had been treated conservatively and one patient had an appendiceal abscess, which was treated with percutaneous drainage. US was performed by a supervised resident in 276 cases (29%), by an unsupervised resident in 264 (28%) and by a staff radiologist in 402 others (43%). The appendix could not be assessed in 414 patients at US (44%), of whom 73 (18%) had a final diagnosis of appendicitis. This is 26% (73 out of 284) of all patients with a final diagnosis appendicitis. On CT the appendix was not assessable in 63 patients (7%), of whom 8 (13%) had a final diagnosis of appendicitis.

Imaging features on US

The frequencies with which features were assigned by the radiologists on US are listed in Table 1. The diagnostic odds ratios of the isolated US features varied between 0.8 for non-compressibility of the appendix and 6.6 for a thickened appendix. In the multivariate model, after backwards elimination, only local transducer tenderness, a thickened appendix and peri-appendiceal fat infiltration were significant at the level 0.05 (Table 1).

Table 1 Solitary features on US associated with appendicitis

Imaging features on CT

The frequencies of features assigned by the radiologists on CT are listed in Table 2. The diagnostic odds ratios ranged from 1.4 for peri-appendiceal free fluid to 10.7 for peri-appendiceal fat infiltration (Table 2). In the multivariate model, only peri-appendiceal free fluid was removed. The variables in the final model were: complete visualisation, a thickened appendix, an appendicolith, increased appendiceal enhancement and peri-appendiceal fat infiltration.

Table 2 Solitary features on CT associated with appendicitis

Radiologist-weighted value of imaging features

In Fig. 1, the weights of imaging features in the logistic regression analysis for the final diagnosis are compared with the corresponding weights for the imaging diagnosis of appendicitis. Many of the weights for the imaging diagnosis are higher than the corresponding weights for the final diagnosis, as the latter is based on additional information as well.

Fig. 1
figure 1

Relative weight given to the imaging features by radiologists compared with the weight of imaging features for the final diagnosis of appendicitis

On US, the radiologist assigned the largest weights to a thickened appendix, a non-compressible appendix and visualisation of an appendicolith, whereas a thickened appendix and transducer tenderness were features with the largest actual weights for the final diagnosis of appendicitis.

On CT, peri-appendiceal fat infiltration and a completely visualised appendix had the highest actual weights for the final diagnosis of appendicitis. The radiologist assigned large weights to a thickened appendix and appendiceal enhancement next to peri-appendiceal fat infiltration. Non-compressibility on US and a thickened appendix on CT were given a significantly higher weight by the observers compared with the actual weight.

US imaging profiles

Imaging profiles based on essential features in the final models developed for patients in whom the appendix was assessable on US (n = 528) are shown in Table 3. Most patients with appendicitis fell within two imaging profiles. A final diagnosis of appendicitis was assigned in 139 of the 147 patients (95%) in whom the radiologists recorded a thickened appendix with local transducer tenderness and peri-appendiceal fat infiltration. Only 14 out of 309 (5%) patients with none of the essential imaging features had a final diagnosis of appendicitis. A flowchart of imaging features and profiles on US is provided in Fig. 2

Fig. 2
figure 2

Flowchart for US

Table 3 Profile of US features significantly associated with appendicitis

If an imaging diagnosis of appendicitis had been assigned whenever two or more of the essential imaging features were present on US, the sensitivity would be 92% (95% CI 89–96%) with a specificity of 83% (95% CI 79–88%).

CT imaging profiles

Five essential CT imaging features were used to create CT imaging profiles in patients in whom the appendix was assessable on CT (n = 879). Table 4 summarizes of the imaging profiles that contain most patients. An overview of all imaging profiles is provided in the Appendix. In the largest subgroup (119 out of 276 (40%) of patients with appendicitis) the radiologist had recorded a completely visualised, thickened appendix with peri-appendiceal fat infiltration and appendiceal enhancement. In this group 114 patients (96%) had a final diagnosis of appendicitis. Of those in whom the radiologist only recorded a completely visualised, thickened appendix, only 57% had appendicitis as the final diagnosis. Having two or more of the essential features had a sensitivity of 96% (95% CI 93–98%) with a specificity of 95% (95% CI 93–96%). Only 10 out of 649 patients (2%) with none of the five essential CT imaging features had a final diagnosis of appendicitis. A flowchart of imaging features and profiles on CT is provided in Fig. 3

Fig. 3
figure 3

Flowchart for CT

Table 4 Profile of CT features for the diagnosis of acute appendicitis

Discussion

We were able to identify essential imaging features and profiles based on these images that can be used to assign a high probability of appendicitis on US and CT. When two or more of the selected features are present, the sensitivity is 92% and 96% on US and CT, respectively. For US the probability of appendicitis was 95% in patients who had a thickened appendix with local transducer tenderness and peri-appendiceal fat infiltration. Most patients with appendicitis fell within this US profile. For CT, patients were distributed over more imaging profiles. In the largest subgroup of patients with appendicitis the appendix was completely visualised, thickened, with peri-appendiceal fat infiltration and appendiceal enhancement. In this subgroup 96% of patients had appendicitis. If patients had none of the essential imaging features on CT the probability of appendicitis was 2%, and if a complete visualised appendix was thickened the probability was 57%. Radiologists gave all imaging features a diagnostic weight that was higher than the actual weight. Furthermore, radiologists assigned the highest weight to features other than those that had the highest actual weight for the final diagnosis of appendicitis. This difference between presumed and actual weight is largest for an appendicolith on US, for which, if visualised on US, the radiologist always assigned the diagnosis of appendicitis. However, from the literature it is known that an appendicolith can be present in the absence of appendicitis [2]. The exact reason why compressibility on US and a thickened appendix on CT were implicitly given a significantly higher weight by radiologists compared with the actual weight is not known; however, both features figure prominently in textbooks dealing with appendicitis. We used a cut-off diameter of 6 mm in defining a thickened appendix. As a larger diameter of the appendix can occur physiologically this may be another explanation for the overestimation of its actual diagnostic value.

This study has some potential limitations. The appendix was only assessable in little more than half (56%) of the patients on US, thereby excluding 73 (26%) of the 284 patients with a final diagnosis of appendicitis from our analysis. On CT, the appendix was not assessable in 8 (3%) of the 284 patients with a final diagnosis of appendicitis. These percentages are comparable with those in other studies evaluating visualisation of the appendix in patients with abdominal pain [1012]. However in a study by Rioux a higher visualisation (88%) of the appendix with US was recorded in patients suspected with appendicitis [13]. Another potential limitation of this study was that observers recorded an imaging diagnosis based on all features recorded, not just appendiceal features, as well as on clinical information provided by the treating physician at the ED. The effect of these factors on the imaging diagnosis could not be evaluated in this study. A thickened appendix is an important feature in the diagnosis of appendicitis. Exact measurements of the (thickened) appendix were not made in this study; we only recorded whether the appendix was thickened (diameter greater than 6 mm) or not. This means that we cannot make a difference between a slightly enlarged appendix and could not evaluate the differential effects these observations may have on the diagnosis of appendicitis. This difference is important as a normal appendix can measure between 6 and 10 mm, as reported by Tamburrini et al. [12]. These authors stated that the diagnosis appendicitis cannot be made based on a thickened appendix alone, but only in association with other features.

The reported accuracy of US and CT for detecting appendicitis is high. Yet imaging examinations scored for study purposes are often evaluated by experienced observers (radiologists) in a single centre study. Patients included within such a diagnostic accuracy study are often clinically suspected of having the particular disease under study, i.e. appendicitis. The design of the present study tried to avoid such bias, by including unselected patients presenting with acute abdominal pain at the ED in a multi-centre study. US and CT were evaluated by a large number of different radiological residents and radiologists, who had different levels of experience, as this would more accurately reflect daily practice and make the results easier to generalise to other hospital settings.

Only three studies have evaluated the accuracy of individual features of appendicitis. None of them looked at combinations of features. There is only one study that we know of that has evaluated the accuracy of US features in detecting appendicitis [14]. Kessler and colleagues found that a thickened appendix, a diameter greater than 6 mm and non-compressibility of the appendix were the most accurate features on US to indicate appendicitis [14]. Our findings are in concordance with these results, as the US imaging profile with a thickened appendix and transducer tenderness, with or without peri-appendiceal fat infiltration, was most frequently found in appendicitis.

Two other studies reported in the literature described the accuracy of CT features. In a study by Rao et al. [7] evaluating the accuracy of single CT features of appendicitis in patients with suspected appendicitis, the sensitivity of an enlarged appendix was 100% and that of adjacent fat infiltration 93%. Other CT features of appendicitis had a sensitivity of less than 69% [7]. In a study by Daly et al. [2] two observers reviewed equivocal CT images of patients with suspected appendicitis. Within this study an appendiceal diameter of greater than 9 mm was considered decisive for the diagnosis of appendicitis, compared with isolated fat stranding and the presence of an appendicolith, which had little diagnostic value. If an appendicolith was present in combination with a thickened appendix (diameter greater than 9 mm), appendicitis was more likely [2]. It is questionable whether the presence of an appendicolith on top of a thickened appendix increases the probability of appendicitis, because in the present study an appendicolith had the lowest odds ratio of all CT-based features. The two previous studies of CT features analysed patients with clinically suspected appendicitis at the ED, whereas in the present study, all unselected patients with acute abdominal pain at the ED were included. Moreover, US and CT features were assessed and compared in the same cohort.

Other studies have examined patient characteristics as well as imaging features of patients with appendicitis missed on imaging [15]. Appendicitis was missed more often on CT if the clinical history was misleading, if there was a paucity of intra-abdominal fat, incomplete contrast opacification of the caecum and distal small bowel (oral contrast material was used), presence of small bowel obstruction and lack of typical CT signs of appendicitis (distended appendix, inflammatory changes in the peri-appendiceal fat, focal caecal wall thickening and the presence of an appendicolith) [15]. For US the sensitivity was significantly lower in female than in male patients with suspected appendicitis [16].

The focus of the present study was on imaging profiles associated with the diagnosis appendicitis. Although the diagnosis appendicitis is still considered a clinical diagnosis by some, recent research derived from the same cohort shows that individual clinical features and laboratory test results have little discriminative power [17]. Furthermore, the classical combination of clinical features—migration of pain to the right lower quadrant (RLQ), tenderness in the RLQ, and rigidity—is only present in 6% of patients with clinically suspected appendicitis [17].

Several pictorial essays emphasised the possibility of missed appendicitis on US or CT if a diagnosis that mimics appendicitis was thought to be the cause of the abdominal pain. The latter could include bowel obstruction, a gynaecological cause or epiploic appendagitis [18, 19]. These diagnoses can cause the appendix to become secondarily thickened or the peritoneal fat in the RLQ can be infiltrated. In the present study imaging profiles associated with a high probability of appendicitis were created. Vice versa the probability of appendicitis was very low if none of the selected features was present. This knowledge may also result in a lower percentage of missed diagnoses.

In conclusion, although all of the examined individual features were associated with a final diagnosis of appendicitis, only a few combinations of essential imaging features on US and CT were found to be associated with a high probability of appendicitis. When only two or more of the essential imaging features are present on US or CT, very good accuracy can be achieved. These imaging profiles on US and CT can be used to adequately diagnose patients with appendicitis at the ED.