Surgeon-Performed Ultrasound in Diagnosing Acute Cholecystitis and Appendicitis

Background The use of ultrasound (US) outside the radiology department has increased the last decades, but large studies assessing the quality of bedside US are still lacking. This study evaluates surgeon-performed US (SPUS) and radiologist-performed US (RPUS) with respect to biliary disease and appendicitis. Methods Between October 2011 and November 2012, 300 adult patients, with a referral for an abdominal US, were prospectively enrolled in the study and examined by a radiologist as well as a surgeon. The surgeons had undergone a 4-week-long US education. US findings of the surgeon and of the radiologist were compared to final diagnosis, set by an independent external observer going through each patient’s chart. Results Among 183 patients with suspected biliary disease, 74 had gallstones and 21 had acute cholecystitis. SPUS and RPUS diagnosed gallstones with a sensitivity of 87.1 versus 97.3%. Specificity was 96.0 versus 98.9%, and the accuracy 92.3 versus 98.2%. The sensitivity, specificity and accuracy for acute cholecystitis by SPUS and RPUS were: 60.0 versus 80.0%, 98.6 versus 97.8% and 93.9 versus 95.6%, respectively. Among 58 patients with suspected appendicitis, 15 had the disease. The sensitivity, specificity and accuracy for appendicitis by SPUS and RPUS were: 53.3 versus 73.3%, 89.7 versus 93.3% and 77.3 versus 86.7%, respectively. Conclusion SPUS is reliable in diagnosing gallstones. Diagnosing cholecystitis and appendicitis with US is more challenging for both surgeons and radiologists. Trial registration number The study was registered at clinicaltrials.gov. Registration number: NCT02469935.


Introduction
The use of ultrasound (US) outside the radiology department, often referred to as point-of-care ultrasound (POCUS), has increased in the last decades as more compact and portable scanners have become available [1]. At Stockholm South General Hospital's surgery department, abdominal POCUS has been part of surgical resident training since 2004. We have previously shown that surgeon-performed ultrasound (SPUS) at the emergency department (ED) results in fewer additional examinations, fewer admissions and shorter lead times to surgery [2].
We, and others, have demonstrated that SPUS can detect gallstones with high diagnostic accuracy [3][4][5][6][7]. Using the same patient cohort as in our recent study [7], our current work focuses on the diagnosis of cholecystitis and appendicitis, two common causes of acute abdominal pain [8]. Previous work on the diagnostic accuracy of radiologistperformed ultrasound (RPUS)-in cholecystitis and appendicitis-shows variable results. The reported sensitivity differs, ranging from 50 to 100% for cholecystitis [8,9] and 52-76% for appendicitis [8,10]. The quality of abdominal US-in these contexts-appears to be even more operator dependent, which may have negative impact on the quality of SPUS since surgeons don't get the same amount of US training as radiologists [11]. To what extent this matters, however, is not known, since studies on the subject are few [12]. The aim of this study was to validate the diagnostic accuracy of SPUS regarding acute cholecystitis and appendicitis, comparing ultrasound examinations to final diagnosis. For comparison, we examined the accuracy of RPUS using the same reference standard. To estimate the overall US competence of the participating radiologists and surgeons, we also included the diagnostic accuracy of detecting gallstones in the analysis.

Enrollment of patients
Three hundred patients, referred to the radiology department at Stockholm South General Hospital, for any diagnostic abdominal US examination, were enrolled between October 2011 and November 2012, and informed consent was obtained. Exclusion criteria were age \18 years, inability to communicate with the examiner and referrals concerning metastases of the liver or contrast-enhanced examinations.

Data collection
Enrolled patients received one US examination by the study surgeon as well as the standard US examination by the on-duty radiologist. The examining surgeon and radiologist were blinded to each other's findings, and examinations were done right after one another when possible, and always within 6 h from each other. The surgeon took a short history from the patient and then performed the US, following a standardized protocol. Each examination took the surgeon approximately 15 min (10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20) to perform. The on-duty radiologist performed a standard care US focusing on the individual referrals, and each examination took approximately 10 min (5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15). Among the radiologists, the major part of the scans was done by US-specialized radiologists with several years of training (73% of the scans were performed by specialists in radiology and the remaining 27% by radiologists in specialist training). The surgeons used a portable US machine of the model LOGIQ e with a convex (1.6-4.6 MHz) or linear (5)(6)(7)(8)(9)(10)(11)(12)(13) transducer, GE Healthcare, WuXi, China. The radiologists used Philips iU22 with a convex C5-1 or a linear L12-5 transducer.

Criteria for patient inclusion
Patients with suspected biliary disease and/or suspected appendicitis were considered eligible for inclusion. Suspected biliary disease was defined as patients presenting with pain in the right upper quadrant (RUQ) and/or tenderness in the RUQ during physical examination and/or with a referral to the radiology department regarding gallstones and/or cholecystitis.
Suspected appendicitis was defined as patients presenting with pain in the right lower quadrant and/or tenderness in the right lower quadrant and/or with a referral to the radiology department regarding appendicitis.

Reference standard
The final diagnosis was set by an independent external observer, a senior consultant surgeon, based on discharge diagnosis, operation logs and pathology reports from each patient's chart. Findings of gallstones, acute cholecystitis and/or appendicitis were marked with ''Yes'' or ''No'' for each patient and each diagnosis in a separate protocol. The diagnosis of acute cholecystitis was set using the Tokyo Guidelines 2013 (TG13) criteria [13] together with operation logs and pathology reports.

US training of participating surgeons
Six study surgeons, five residents in their final years and one specialist in surgery, attended a 1-week course on US physics, technique, anatomy and hands-on training, led by specialists in US, followed by 3 weeks of training in the radiology department. The training has been thoroughly described previously [7]. After completing the training, each surgeon spent 2 weeks enrolling and scanning patients during office hours in the hospital's radiology department.

Statistical analysis
We calculated sensitivity, specificity, overall accuracy, positive predicted value (PPV), negative predicted value (NPV), positive likelihood ratio (LR?) and negative likelihood ratio (LR-) for SPUS and RPUS in detecting gallstones, cholecystitis and appendicitis, respectively. Final diagnosis, defined above, was set as reference standard. We calculated the inter-observer agreement between surgeons and radiologists for each of the three diagnoses using Cohen's kappa. The sample size of 300 patients comes from a power calculation in a previous study designed to detect a difference between SPUS and RPUS in detecting gallstones [7]. We used the same cohort for these additional diagnoses. To study if there was any systematic difference between how often the surgeon and the radiologist set each diagnosis, we used McNemar's test. A p value \0.05 (two tailed) was considered statistically significant. Analyses were done in IBM SPSS Statistics, version 23. We used the efficient score due to Wilson to calculate the 95% confidence intervals (CI) of sensitivity, specificity and accuracy [14,15]. CI for LR were calculated using the Log method [16,17].

Patients
Of the 300 eligible patients, 228 met the criteria for suspected biliary disease (n = 183) and/or appendicitis (n = 58) and were included for further analysis (Fig. 1). Baseline characteristics for the two groups are shown in Tables 1 and 2, respectively.  Table 3.    One hundred and sixty of the patients were examined both by surgeon and radiologist. Inter-observer agreement (Cohen's kappa) between surgeon and radiologist regarding gallstones was 0.79 (good agreement). There was no systematic difference between surgeons and radiologists in how often the diagnosis was set (p = 0.454).
One hundred and fifty-two of the patients were examined both by surgeon and radiologist. Cohen's kappa regarding cholecystitis was 0.61 (good agreement). There was no systematic difference between surgeons and radiologists in how often the diagnosis was set (p = 0.227).

False negative cases
Ten patients with acute cholecystitis were missed either by the surgeon or the radiologist or both. Characteristics of the false negative cases are presented in Table 4. Radiologists found six cases, which the surgeons missed, while surgeons found two cases missed by the radiologists. Surgeons and radiologists agreed in negative finding in two cases. Four of the 10 patients had a white blood cell (WBC) count more than 10 (10 9 /L), and four had a C-reactive protein (CRP) more than 30 (mg/L). Two patients had a temperature
Forty-one of the patients were examined both by surgeon and radiologist. Cohen's kappa regarding appendicitis was 0.41 (moderate agreement). There was no systematic difference between surgeons and radiologists in how often the diagnosis was set (p = 0.754).

False negative cases
In all seven cases missed by SPUS, the surgeon couldn't find the appendix. RPUS correctly diagnosed five of these. RPUS missed four cases of appendicitis, of which SPUS correctly diagnosed two. All of these cases were confirmed with appendicitis at surgery. None of the false negative cases had a registered BMI [30. Information about BMI was missing in two of the cases.

False positive cases
SPUS misdiagnosed three and RPUS two cases as positive for appendicitis. SPUS and RPUS agreed in the positive finding in one of these cases, where examiners both found a tender 7-mm tubular structure in the RLQ. Three of the total four false positive cases were discharged with non-

Discussion
Our results show that the diagnostic accuracy of SPUS concerning cholecystitis is considerably lower than for gallstones. The specificity, however, is still high, and the diagnostic value of the investigation is underlined by the high positive likelihood ratio. This holds for surgeons as well as radiologists. For appendicitis, both surgeons and radiologists reach a rather low sensitivity (53.3 vs. 73.3%).
The high specificity and likelihood ratio though show that the examination still has a diagnostic value. Hence, our study shows that it is more complicated to diagnose both cholecystitis and appendicitis with US, compared to gallstones. Although some studies, with exceptionally fine results for SPUS and RPUS regarding these diagnoses, have been presented [18], our results are well consistent with the reviewed literature and previous larger studies [9,10]. In the systematic review by Carroll et al [19], in which it is concluded that SPUS can be regarded a sensitive and specific modality for the detection of appendicitis and gallstones, the included studies had results with higher sensitivity and specificity than in our study. However, in several of these studies the inclusion criteria were quite narrow and the prevalence of disease (appendicitis or gallstones) was considerably high, which might have affected the reported sensitivity and specificity [20]. As also mentioned by Carroll et al., one must consider observer bias in some of the included studies, since surgeons assessing the outcome were not blinded to the examinations, or in fact performed the US themselves.
Although this study focuses on the diagnostic accuracy of SPUS with respect to gallstones, cholecystitis and appendicitis, we also chose to compare the surgeon's bedside examination (with a portable machine) to the radiologist's examination (with a high end machine) for each diagnosis to get an approximation of the overall difficulty in examining these patients. RPUS is since many years accepted as the gold standard for diagnosing gallstones. It is also the recommended examination to confirm cholecystitis [13]. Limitations of the ultrasound examination for these diagnoses, but for patients with high BMI and non-fasting patients, are rarely discussed. We looked closer at the diagnosis of cholecystitis in patients who were misdiagnosed by SPUS and RPUS. It seems that early stage of the disease may contribute to a considerable amount of patients not fulfilling the diagnostic criteria of TG13 at the time of the scanning, as shown in Table 4, which might have affected the accuracy. BMI seems to be of some but limited importance for diagnostics in our material. Among cholecystitis cases, there were six individuals with BMI [30, of which four were misdiagnosed either by surgeon (three) or radiologist (two), one missed by both. The patient with highest BMI (39.7) was correctly diagnosed by SPUS but missed by RPUS. We found no systematic difference between SPUS and RPUS in how often the diagnosis was set, although the sensitivity (60.0 vs. 80.0%) differs quite a lot. However, it is hard to draw conclusions from this, considering the low prevalence of the disease in this cohort. In a wider aspect, if you look at SPUS as a piece of a diagnostic puzzle, alongside with other pieces such as auscultation, percussion, palpation and laboratory tests, sensitivity as low as 60% might be quite acceptable, especially when specificity and LR? is high and the examination is without side effects. One could argue that perhaps the lower sensitivity for SPUS could be outweighed, to some extent, by the advantage of accessibility and owning the whole clinical picture.
RPUS for appendicitis is looked upon differently compared to acute cholecystitis. Most clinicians are aware of the problem with sensitivity and consider RPUS an adjunct to the clinical examination with the possibility to confirm but not exclude the diagnosis [8,10]. Our results indicate that SPUS could be used in the same manner both for cholecystitis and appendicitis.
The included patients in our study represent a wide range of different diagnoses causing acute abdominal pain. The prevalence of each of the studied conditions well represents the normal range of differential diagnoses seen at the ED [3,8,21,22], with appendicitis as a possible exception. The low prevalence of appendicitis in this study may be due to other preferred diagnostic imaging for appendicitis such as computed tomography chosen at our ED. The range of differential diagnoses, however, is one of the strengths of the study, making it clinically relevant and lowers the risk of selection bias.
Another strength with our work is that we have elucidated the accuracy of not only SPUS, but also RPUS in the study. This allows us to compare SPUS to standard care in the cohort. It also lets us draw general conclusions about US examinations for the diagnoses studied. The inclusion of gallstone diagnosis, although studied before, also strengthens the study. If not included, relative low accuracies for cholecystitis and appendicitis could be conferred to low US proficiency. The accuracies for RPUS and SPUS for the detection of gallstones now contradict such reasoning.

Conclusion
SPUS is reliable in diagnosing gallstones. Diagnosing cholecystitis and appendicitis with US is more challenging for both surgeons and radiologists. We believe that SPUS could be used as an adjunct to the clinical examination with the possibility to confirm but not exclude these diagnoses. Further studies are needed to elucidate the difficulties with bedside US in cholecystitis and appendicitis.