Introduction

Pediatric acute appendicitis (PAA) continues to be a major diagnostic challenge nowadays. The important consequences in terms of morbidity, mortality and health-care costs attributable to misdiagnosis make this a public health problem that requires urgent attention [1].

In recent decades, multiple biomarkers have been explored as potential diagnostic tools in the context of PAA [2,3,4,5]. Although some biomarker demonstrated acceptable diagnostic yields, none showed sufficient discriminatory capacity to be considered as a unique test in the diagnosis of PAA [6, 7]. Besides, it must be noted that many of those markers are used with research purposes and that their implementation in clinical practice is not feasible because of either economic or processing time issues.

Previous studies analyzed the validity of ratios derived from the basic blood count as diagnostic tools, including the neutrophil-to-lymphocyte ratio, the platelet-to-lymphocyte ratio and the monocyte-to-lymphocyte ratio [8, 9]. Those ratios had important advantages, such as not requiring additional economic or human resources and being available from the outset for assessment. Although the neutrophil-to-lymphocyte ratio showed a good diagnostic performance, it also cannot be considered in isolation for the diagnosis of PAA.

The systemic immune-inflammation index (SII) is a novel ratio that has been proposed as a diagnostic and prognostic biomarker in different clinical situations, as neoplastic processes or autoimmune diseases [10, 11]. This ratio, which combines the absolute values of neutrophils, lymphocytes and platelets, reliably reflects the degree of systemic inflammation/immune activation. Neutrophilia is a marker of acute stress and is closely related to bacterial infections which, when pronounced, may be accompanied by lymphopenia. In addition, in systemic inflammatory conditions, platelets can act as an acute phase reactant. Given the intrinsic characteristics of PAA, we hypothesize that this ratio may be more valid than the neutrophil–to-lymphocyte ratio for the diagnosis of this pathology. To the best of our knowledge, the diagnostic performance of SII has not been assessed in PAA to date.

On the other hand, the use of specific scores such as the Alvarado score and the pediatric appendicitis score (PAS) for the PAA diagnosis is widespread. Although the Alvarado score is not specific for children, it has demonstrated adequate diagnostic performance in this population [12].The PAS, a modified version of the former for the pediatric population, has also shown good diagnostic yield, but is far from being perfect [13].

The aim of this study was to evaluate the diagnostic performance of a series of clinical, analytical and ultrasound parameters to combine them into a simple and easy-to-apply index to improve the diagnosis of PAA.

Materials and methods

Study design

BIDIAP (BIomarkers for the DIagnosis of Appendicitis in Pediatrics) is a prospective non-randomized observational study [14, 15]. Participants were recruited in the Emergency Department and in the Pediatric Department of our center (a tertiary-level pediatric hospital) when the personnel conducting the investigation were present. The recruitment period extended from February to December 2021. Inclusion and exclusion criteria are listed in Supplementary file 1.

Two groups of pediatric patients were included in this study: (1) patients with non-surgical abdominal pain (NSAP) (patients who were initially evaluated with a clinical suspicion of acute abdomen and in whom the presence of urgent abdominal surgical pathology was excluded) and (2) patients with histopathological confirmed diagnosis of PAA.

Peritoneal irritation was defined as the presence of Blumberg’s sign (rebound tenderness in the right iliac fossa), assessed by the physician who enrolled the patient in the study. Sociodemographic, clinical, analytical, surgical, radiological and histological variables of all patients were extracted from participants’ clinical records by the principal investigator (JAM).

All patients in group 1 were contacted 2 weeks after their inclusion in the study to ensure that they had not been diagnosed with PAA in that period. All patients in group 2 were followed up on an outpatient basis for 1 month after the intervention.

Sample collection

A venous blood sample was obtained from each patient in an EDTA tube (3.5 mL). In all patients, it was obtained at the time of inclusion in the study during their stay in the Emergency Department. Serum samples were processed by laboratory personnel blinded to patient’s group.

Calculation of systemic immune-inflammation index (SII)

The absolute neutrophil count (ANC) was defined as the total number of neutrophils in the complete blood count (CBC). The absolute platelet count (APC) was defined as the total number of platelets in the CBC. The absolute lymphocyte count (ALC) was defined as the total number of lymphocytes in the CBC.

The SII was calculated as follows [10]: (ANC × APC)/ALC. To calculate the best cut-off value for SII (NSAP vs PAA) the distance on the ROC curve was calculated as the square root of [(1 − sensitivity)2 + (1 − specificity)2] and that with the shortest distance (lowest value) was considered the optimal cut-off.

Radiological determinations

The appendiceal caliber (maximum transverse diameter), the presence of ultrasound appendicolith and the presence of ultrasound mesenteric lymphadenitis were evaluated. We did not consider other variables (such as appendicular parietal destratification or appendicular Doppler flow) because they are more operator dependent and, therefore, less useful as part of an index.

In relation to mesenteric lymphadenitis, we consider ultrasound positivity the presence of at least one lymph node greater than 1 cm of maximum axis.

Regarding, appendiceal caliber, all measurements were performed on ultrasonographic studies by the radiologist on duty. To calculate the best cut-off value for appendiceal caliber (NSAP vs PAA), the distance on the ROC curve was calculated as the square root of [(1 − sensitivity)2 + (1 − specificity)2] and that with the shortest distance (lowest value) was considered the optimal cut-off.

Statistical analysis

For descriptive purposes, we used means and standard deviations or medians and interquartile ranges (IQR) for quantitative variables and proportions for categorical ones. Kolmogorov–Smirnov test was used to assess the normality of quantitative variables. Sociodemographic and clinical variables were compared between groups using the Fisher exact test and the Mann–Whitney U test.

To identify the independent predictors of PAA, a multivariable logistic regression was performed using a forward stepwise analysis with p for removal < 0.05. Continuous variables were previously dichotomized based on the best diagnostic performance cut-off to distinguish between PAA and NSAP and entered the model in increasing order of p value obtained in the univariate analyses. This analysis eliminated collinearity between variables to create a parsimonious model. Multiples of integer values were assigned to the variables of the index according to the beta coefficients obtained in the analysis. We assessed the discriminatory capacity of the BIDIAP index by calculating the area under the receiver operating characteristic curves (ROC). For each cut-off value the distance on the ROC curve was calculated as the square root of [(1 − sensitivity)2 + (1 − specificity)2] and that with the shortest distance (lowest value) was considered the optimal cut-off. Lastly, we performed a calibration of the BIDIAP index in our cohort using the Hosmer–Lemeshow test.

Statistical significance was settled in a p value < 0.05. Statistical analyses were performed with STATA 17.0 (Stata Corp LCC).

Research ethics board committee

This study was approved by our center's clinical research ethics committee on December 18, 2020, under code PI_2020/112. The ethical principles of the Declaration of Helsinki were applied for the conduct of this research study. The parents or legal representatives of all participants signed an informed consent form prior to their inclusion in the study.

Results

Demographic and clinical characteristics

Among the 151 patients recruited, 17 (11%) in the NSAP group were excluded due of missing information in the appendiceal caliber. Therefore, the final sample consisted of 134 patients, divided into two groups: (1) patients with non-surgical abdominal pain in whom the diagnosis of PAA was excluded (n = 36) and (2) patients with a confirmed diagnosis of PAA (n = 98). Participants’ sociodemographic and clinical characteristics by group are shown in Table 1. Statistically significant differences were found in age (p = 0.13), sex (p = 0.07) and number of emetic episodes (p < 0.0001). No significant differences were observed between included and excluded children in sociodemographic and clinical variables (data not shown). None of the patients in the NSAP group developed PAA.

Median (interquartile range) serum SII values were 696.34 (355.67–1350.38) in group 1 and 2381.85 (1409.14–3497.33) in group 2 (p < 0.0001). The graphical representation of SII by groups is shown in Fig. 1. A logarithmic scale was used because of the wide analytical range obtained in the determinations. The AUC for SII was 0.85 (95% CI 0.78–0.92) (p < 0.0001). The cutoff with the best percentage of correctly classified (81%) observations corresponded to 890, resulting in a sensitivity of 89.80% and a specificity of 66% (positive likelihood ratio: 2.64). The graphical representation of the ROC curve for SII is shown in Fig. 2.

Table 1 Clinical and sociodemographic characteristics of the participants of the study
Fig. 1
figure 1

Algorithmic box plot representation of SII values in the two study groups

Fig. 2
figure 2

ROC curve for SII (group 1 vs 2)

The odds ratio (OR) and 95% confidence interval (CI), p value and the pseudo R2 value for PAA associated with each independent predictor are shown in Table 2. The multivariable model using a forward stepwise analysis showed that the variables that independently predicted the odds of appendicitis were: appendiceal caliber ≥ 6.9 mm, SII ≥ 890 and the presence of peritoneal irritation. The beta coefficients obtained for each of the predictors were 5.42 (appendicular caliber ≥ 6.9 mm), 3.07 (SII ≥ 890) and 2.38 (presence of peritoneal irritation). Those coefficients were divided by 2.38 (minimum common divisor) and multiplied by 2, resulting in the following equation:

$${\text{BIDIAP}}\;{\text{index}} = \, 4\,\left( {{\text{appendicular caliber }} \ge \, 6.9mm} \right) + 3\,\left( {{\text{SII }} \ge \, 890} \right) + 2\,\left( {\text{presence of peritoneal irritation}} \right).$$
Table 2 Potential variables that were assessed for inclusion in the score, ordered by statistical significance and pseudoR2 value

Table 3 shows the components of the BIDIAP index and their scoring weights. Mean (SD) score in the BIDIAP index was 2.38 (2.06) in the group of children with NASP and 7.89 (1.50) in the PAA group (p < 0.0001). The area under the curve (AUC) for the BIDIAP index was 0.97 (95% CI 0.95–0.99) (p < 0.001) (Fig. 3). The cut-off value with the shortest distance on the ROC curve was 4, with a sensitivity of 98.98% and a specificity of 77.78%. According to that cut-off, the diagnostic of PAA could be established if one of the following two conditions were met: either the appendiceal caliber is ≥ 6.9 mm or SII is > 890 and peritoneal irritation is present.

Table 3 Proposed BIDIAP index
Fig. 3
figure 3

ROC curve for BIDIAP score (group 1 vs 2)

Table 4 shows alternative cut-off points in the BIDIAP index with their respective diagnostic performance in terms of sensitivity, specificity and positive likelihood ratio.

The calibration test showed that the BIDIAP index fitted excellently in our sample (p = 0.82 in the Hosmer–Lemeshow test), as it is represented in Fig. 4.

Table 4 Alternative BIDIAP score cut-off values for the discrimination between NSAP and PAA. Diagnostic performance is presented in terms of sensitivity, specificity, and positive likelihood ratio
Fig. 4
figure 4

Calibration model for the SCORE (Hosmer–Lemeshow test)

Discussion

In this prospective study of 134 patients, we performed a thorough statistical analysis guided by biological plausibility criteria to identify independent predictors of PAA and to develop an easy-to-use index that showed an exceptional diagnostic performance in this pathology. The BIDIAP index includes easily accessible variables (commonly assessed in Pediatric Emergency departments), including SII, a ratio calculated from hemogram parameters whose diagnostic performance in PAA had not been previously evaluated.

To date, multiple scores have been evaluated as potential diagnostic tools in the context of PAA. The Alvarado score, initially designed for adult populations and subsequently extrapolated to the pediatric population, has shown moderate performance in the diagnosis of PAA [12]. Similar findings have been reported for PAS, a score designed specifically for the pediatric population. [16]. More recently, the appendicitis inflammatory response score (AIR score) and the pediatric appendicitis risk calculator (pARC) demonstrated diagnostic superiority over the PAS score and the Alvarado score, although clinical experience was limited [17]. The BIDIAP index has two interesting advantages that make it a better predictor of PAA than previous indices: (1) a higher diagnostic yield and (2) a remarkable simplicity, as it is composed of only three parameters and does not require specific laboratory determinations.

SII has been previously analyzed as a prognostic tool in multiple pathologies such as advanced non-small cell lung cancer [18] and extrahepatic cholangiocarcinoma [19]. In relation to its usefulness as a diagnostic tool, its application is limited, having been documented in contexts such as subacute thyroiditis [20]. To our knowledge, this is the first study assessing the diagnostic yield of the SII in PAA. In our sample, the discriminatory capacity of SII (AUC = 0.85; 95% CI 0.78–0.92) was similar to that of the neutrophil-to-lymphocyte ratio (AUC = 0.83; 95% CI 0.75–0.90), but SII showed higher OR for PAA and higher pseudo R2. Although more evidence is needed, our findings suggest that the SII might emerge as one of the best biomarkers for the diagnosis of PAA in clinical practice.

Our results agree with previous studies that reported that the appendiceal caliber was a strong predictor of PAA [21].

In our sample, the diagnostic performance of the appendiceal caliber alone was excellent (AUC 0.90; 95% CI 0.84–0.97). Indeed, the BIDIAP score showed that PAA diagnosis could be established even if only that condition was met (appendiceal caliber ≥ 6.9 mm). Our index adds to the previous evidence, because the diagnosis of PAA can also be conclusive when that condition is not reported as long as the other two conditions are. The ultrasound-guided visualization of the appendix in children is very technician dependent and sometimes the caliber of the appendix can be very difficult to measure [22]. Although the literature regarding the proportion of appendiceal visualization in NSAP and PAA groups is scarce, in our clinical experience it is more likely that non-visualization corresponds to NSAP cases (with a smaller appendicular caliber), since in PAA cases the appendix is usually enlarged and presents locoregional inflammatory changes that facilitate its identification. However, it is worth mentioning that recent working groups, through the implementation of specific reporting templates and education sessions, have managed to improve ultrasonographic appendix identification [23].

Regarding physical exploration, the most important clinical features assessed for the diagnosis of PAA are peritoneal irritation, defined as a positive Blumberg's sign and abdominal guarding. In our sample, peritoneal irritation (OR = 3.35; 95% CI 1.67–6.73) showed a similar association with PAA as abdominal guarding (OR = 3.45; 95% CI 1.72–6.94). Accounting for appendiceal caliber and SII, abdominal guarding was not significantly associated with the odds of PAA. This result may indicate that peritoneal irritation is usually easier to detect than abdominal guarding, especially in patients with high body mass index. In this study, participants were recruited when a member of the research team was present and physical examination was performed by the same team member following a standard procedure. Therefore, we acknowledge that, although there might be some inter-observer variability, it should be small.

Our study has several strengths including its prospective design, its large sample size and the thorough statistical analyses. The use of the statistical model presented here makes it possible to correct collinearity to a large extent, which in our opinion is one of the main problems of the diagnostic scores published to date for diagnosing PAA.

We believe that studies assessing the diagnostic performance of different biomarkers in PAA would benefit from more complex statistical analyses, such as the ones presented in this study, which allow the control of confounding and the calculation of the strength of the associations and the robustness of the results.

Despite our findings, we must acknowledge some limitations.

First, we used a convenience sampling, which is susceptible of a selection bias. Besides, 17 patients from the NSAP group (11%) were excluded due to missing data on their appendiceal caliber. Nevertheless, inclusion and exclusion criteria were strictly applied and we did not observe significant differences between included and excluded patients in any of the sociodemographic and clinical characteristics evaluated. For a selection bias to occur in this study, most of the excluded patients would have to have a large appendiceal caliber (≥ 6.9 mm), which is unlikely, as the most likely reason for missing is that the appendix cannot be visualized. Second, external validation of the BIDIAP index needs to be assessed before its use is recommended in clinical practice. Until then, it should be noted that, in our sample, has proven to be valid and to have an excellent discriminatory capacity and calibration.

In conclusion, our results showed that the BIDIAP index is an easy-to-use and inexpensive diagnostic tool with excellent diagnostic performance in PAA. Although external validation is necessary, initial results look promising.