Outcome in patients with open abdomen treatment for peritonitis: a multidomain approach outperforms single domain predictions

Numerous patient-related clinical parameters and treatment-specific variables have been identified as causing or contributing to the severity of peritonitis. We postulated that a combination of clinical and surgical markers and scoring systems would outperform each of these predictors in isolation. To investigate this hypothesis, we developed a multivariable model to examine whether survival outcome can reliably be predicted in peritonitis patients treated with open abdomen. This single-center retrospective analysis used univariable and multivariable logistic regression modeling in combination with repeated random sub-sampling validation to examine the predictive capabilities of domain-specific predictors (i.e., demography, physiology, surgery). We analyzed data of 1,351 consecutive adult patients (55.7% male) who underwent open abdominal surgery in the study period (January 1998 to December 2018). Core variables included demographics, clinical scores, surgical indices and indicators of organ dysfunction, peritonitis index, incision type, fascia closure, wound healing, and fascial dehiscence. Postoperative complications were also added when available. A multidomain peritonitis prediction model (MPPM) was constructed to bridge the mortality predictions from individual domains (demographic, physiological and surgical). The MPPM is based on data of n = 597 patients, features high predictive capabilities (area under the receiver operating curve: 0.87 (0.85 to 0.90, 95% CI)) and is well calibrated. The surgical predictor “skin closure” was found to be the most important predictor of survival in our cohort, closely followed by the two physiological predictors SAPS-II and MPI. Marginal effects plots highlight the effect of individual outcomes on the prediction of survival outcome in patients undergoing staged laparotomies for treatment of peritonitis. Although most single indices exhibited moderate performance, we observed that the predictive performance was markedly increased when an integrative prediction model was applied. Our proposed MPPM integrative prediction model may outperform the predictive power of current models.


Introduction
Primary and secondary infectious peritonitis pose major challenges for clinicians globally [1,2] . Despite extensive research and aggressive surgical management of source control-such as open abdominal clinical strategies with repeated abdominal lavage or interventions-the prognosis of generalized peritonitis remains poor, with mortality rates of up to 60% [3] . There is some evidence of favorable outcomes when using staged relaparotomies and repeated lavage in selected cohorts [1,4,5] . This approach is now accepted as a standard procedure for the septic abdomen [6,7].
Patients usually undergo explorative laparotomy if peritonitis is suspected. Following meticulous exploration of the abdomen, thorough cleansing with copious fluids, and surgical repair of lesions, the abdominal cavity is left open and the small bowel is protected with an intestine bag. To minimize abdominal wall retraction during open abdomen treatment, a mesh is sutured into the dorsal aspect of the rectus muscle.
Despite considerable clinical and scientific efforts, mortality remains high in these patients, and predictive scores are warranted to change the clinical approach from "reaction to deterioration" towards a more proactive "anticipation of deterioration". [8] Bosscha et al. previously stated that early and reliable classification of intra-abdominal sepsis is essential. This could further be useful to select patients for aggressive surgical techniques and to evaluate and compare the results of different clinical treatment regimens [9] . Bosscha et al. demonstrated an association of the Mannheim Peritonitis Index (MPI) and the Acute Physiology And Chronic Health Evaluation (APACHE) II in a cohort of peritonitis patients [9] . In light of the study design and the focus on chronic concomitant diseases, the best resource allocation and acute surgical management in these patients remains controversial [10] . Another group assessed the feasibility of predicting mortality with a country-specific calibrated Simplified Acute Physiology Score (SAPS) II in intensive care (ICU) patients, and showed that overall mortality is overestimated when the SAPS II is used [11] . We hypothesized that an integrative approach may be useful to improve prediction modeling in patients with peritonitis, and could include demographics, typical physiological scoring systems (SAPS II and MPI) as well as a specific clinical course.
The need for advanced prediction models might lead us to a better understanding of the variables that affect this clinical challenge. Therefore, we investigated the utility of a newly designed multivariable approach that deliberately employs laborartory results and clinical and surgical indices. This was done in an effort to best reflect the clinical challenges facing a large cohort of peritonitis patients undergoing open abdomen and staged lavage treatment. We combined typical clinical scores with surgical parameters, as this might best reflect the decision-making performed by health care professionals.

Methods
The study was performed in adherence to the principles in the Declaration of Helsinki, and approved by the Hamburg Medical Association (#WF072/20) as the responsible institutional review board/human ethics board. The need for individual patients' or legal surrogates' or parents' or legal guardians' written informed consent was deemed unnecessary, given the retrospective nature of the data analysis.

Patients
Data of 1,351 adult patients treated for peritonitis in the Department of General and Visceral Surgery of the Asklepios Hospital Altona, Hamburg, Germany, were analyzed retrospectively. All adult patients were treated in the unit ICU and had undergone open abdomen treatment and staged lavage during the study interval (January 1998 to December 2018). For "transparent reporting of a multivariable prediction model for individual prognosis or diagnosis" we followed the Equator TRIPOD statement [12].

Measurements
Age at admission, Mannheim Peritonitis Index (MPI), number of staged lavages, duration of mechanical ventilation (in hours), incision type (median vs. transverse) during staged lavage, fascia closure at the end of the staged lavage, presence of a wound-healing disorder, fascial dehiscence, postoperative complications, and mortality were documented.
SAPS-II scores [13][14][15] were available for the period 1998 to 2018 and MPI scores from 2008 to 2018. SAPS-II scores were lower during the years 1998 to 2007 (median 40.0 and interquartile range [32.0-52.0]) than during the period 2008-2018 (46.0 [36.0-57.0]), for which both physiological scores were available (p < 0.001). This possible source of bias in our analysis of mortality prediction is discussed within the context of the limitations of this study. A full data availability atable is presented in the Supplementary Material.
The following standard surgical procedures applied: type of incision was based on the incision previously performed. When no previous abdominal incisions or laparoscopic trocar incisions were recorded or observed, a transverse incision was used. All four quadrants were inspected and thoroughly cleaned with copious fluid. After ensuring that no infectious pockets were left undrained, the abdominal cavity was left open, the small bowel was covered with negative pressure (Vi-Drape® intestine bag, Cardinal Health GmbH, 22,848 Norderstedt, Germany), and Parietex® mesh (Medtronic GmbH, 40,670 Meerbusch, Germany) was sutured into the dorsal aspect of the rectus muscle.
Wound-healing disorders (WHD) were defined as any evidence of cutaneous wound infection or need for reopening of a skin wound, with or without bacteria detected in microbiological swabs. Any evidence of reopening of the fascia after the closure of staged lavage was defined as fascial dehiscence. Data on the severity of the disease, as determined by the SAPS-II score, were collected upon admission to the intensive care unit. The surgical results were determined retrospectively.

Statistical analyses
Continuous variables were expressed as mean and standard deviation when normally distributed, based on a Shapiro-Wilk test of normality and visual inspection of Q-Q plots, and as median and interquartile range (IQR) otherwise [16] . Differences in a continuous outcome between two groups were assessed with Student's T-test in case of normally distributed outcomes and with a Mann-Whitney Test otherwise. Proportions are presented as numbers and percentages, and tests of association of two groups with a binary outcome were performed using a chi-square test.
To assess the ability of the various demographic, physiological and surgical variables to predict the binary survival outcome, we first computed univariable logistic regression for each predictor. Second, those predictors associated with the mortality outcome and with sufficient observations were selected and combined in domain-specific multivariable logistic regression models (demographics, physiological, surgical) to examine the individual prediction skill of each domain. Finally, all predictors were combined in a multivariable logistic regression model [17] . Goodness of fit of these models was assessed with the Hosmer-Lemeshow test as well as with calibration plots, and overall model performance was quantified using the Brier Score [18,19] . The discriminative ability of each logistic regression model was computed with concordance statistics/the area under the receiver operating curve (AUROC) [20,21] . Predictor importance in the multi-domain prediction model was assessed with two methods: (i) the absolute value of the t-statistic for each model parameter and (ii) dominance analysis [22,23] . In terms of missing data, we followed a complete case analysis and omitted missing values for each regression model.
In order to (i) internally validate the models' ability to discriminate between the survival outcome (as expressed with the AUROC values for each regression model) and (ii) to compare the predictive skill across single predictor models, domain-specific models and the multidomain model, we employed a repeated random sub-sampling validation for each regression model. The following steps were repeated a thousand times for each regression model: (1) The available data was randomly divided into a training set (containing 65% of the available data) and the logistic regression model was fit using this training data. (2) To categorize the model prediction probabilities into the binary survival outcome categories (survived, died), an optimal cutoff value for the predicted probabilities was computed according to the Youden Index. (3) The fitted regression model-in combination with the optimal cutoff value-was subsequently used to predict the individual binary outcomes of the validation set (containing the remaining 35% of the data). (4) The following indicators of prediction performance were computed for the validation set: balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value and the diagnostic odds ratio [24] . Overall, this validation procedure resulted in a distribution with 1,000 samples for the indicators of prediction performance which were depicted with box plots.
To determine the sample size required for the logistic regression models, we followed the method of Peduzzi et al. 1996: [25] assuming 10 possible covariates in the full multi-domain regression model and a mortality rate of 20%, we calculated 500 as the minimum number of patients. A p < 0.05 was considered statistically significant. All statistical analyses were computed with R (R version 4.0.2; R Core Team 2020). Calibration plots were computed with the package givitiR [26].

Demographics and clinical outcome
Data of 1,351 consecutive patients undergoing staged laparotomies were analyzed. Table 1

Domain-specific prediction models
We next considered the association between individuals' demographic, physiological and surgical variables and the binary survival outcome (Table 2). Here, univariable logistic regression reveals significant associations between mortality and patient age, SAPS-II and MPI scores, the number of days in the ICU, as well as for wound complications and surgical management (skin and fascial closure). To examine the predictive capabilities of the demographic, physiological and surgical domain, we grouped the predictors into domain-specific multivariable logistic regression models (Table 3). Calibration plots for these three models of predicted mortality versus observed mortality in our cohort are shown in Fig. 1. The demographic prediction 1 3 model is well calibrated, whereas the physiological prediction underestimates the mortality for low probabilities on the one hand and overestimates high mortality probabilities on the other hand. The surgical prediction model generally overestimates the observed mortality in the cohort.
We found that patient age was the strongest predictor of the demographic variables, with an odds ratio (OR) of 1.04 (1.02-1.06, 95% CI). A unit increase in the SAPS-II score and MPI score increases the odds of mortality by 6% (5%-8%) and 5% (3%-7%), respectively. The odds

The integrative prediction models
A final prediction model-the multidomain peritonitis prediction model (MPPM)-was constructed ( Table 4). The model is built upon the clinical observation that an integrative data analysis might best indicate clinical outcomes. The MPPM is based on data of n = 597 patients, features a high AUROC value of 0.87 (0.85-0.90) and a Brier Score of 0.12, and is well calibrated (Fig. 1D). Figure 2 illustrates the marginal effects for each predictor, and highlights that mortality steadily increases for older patients and higher SAPS-II and MPI scores. For example, a SAPS-II score of 80 predicts a survival probability of 56% (41%-70%, 95% CI), holding the other predictors at the values referenced in Fig. 2 constant. Skin closure at the end of surgery is a powerful predictor of survival: in patients in whom skin closure cannot be achieved, mortality is predicted to be 90% (78%-96%) when the other predictors are held constant. Supplemental Fig. 2 illustrates the relative importance of the individual predictors in the integrative MPPM model. Estimates of the relative importance based on the absolute value of the t-statistic in the prediction model are shown in Panel A, whereas dominance analysis was employed to compute the estimates of relative importance in Panel B. The latter method systematically examines all possible subsets of  the model predictors and evaluates the additional contribution of a particular predictor to a measure of model fit (in our case McFadden's R 2 ). The two independent methods agree in their overall ranking of variable importance. The physiological predictors SAPS-II score and MPI score are slightly less important than skin closure.
The demographic predictors appear to be only marginally important relative to the surgical and physiological predictors.

Comparison of domain-specific and integrative prediction models
To conclude, we compared the predictive ability of the single predictor models and the three domain-specific models (demographic, physiological, surgical) with the integrative multidomain peritonitis prediction model within a repeated random sub-sampling validation framework. Figure 3 shows the median and interquartile ranges for a suite of performance indicators in the 1,000 random subsampling ensemble. Note that the regression coefficients used for the mortality prediction probabilities and the optimal cutoff for distinguishing between survival and death panel C). Panel D illustrates the calibration of the multidomain peritonitis prediction model, which includes the predictors from all three domains. The diagonal red lines denote a 1:1 relationship between predicted and observed mortality were solely based on the data of random training sets. The MPPM model features the overall highest balanced accuracy, with a median of 78% (76%-80%, interquartile range). The surgical-domain model demonstrates a higher diagnostic odds ratio (median 34; IQR: 24-62) than the multidomain model (median 13; IQR: [10][11][12][13][14][15][16]. It is characterized by a low proportion of correctly identified positives (sensitivity) but a high proportion of correctly identified negatives (specificity). While individual domain-specific models show capabilities similar to those of the MPPM for some indicators, such as the physiological model for sensitivity, Fig. 3 illustrates the key finding of our study-that combining the predictors of various domains increases the overall ability to predict  predictive value, sensitivity and specificity). Box plots illustrate the median and interquartile ranges of these distribution. Capitalized predictors denote logistic regression models including all predictors of a particular domain, i.e., the model DEMOGRAPHICS includes the age of the patient and sex as predictors the binary survival outcome in patients undergoing staged laparotomies for peritonitis treatment (see Fig. 1).

Discussion
Prediction modeling seems of great clinical importance in the clinical scenario investigated. Current predictions mostly rely on single-index analyses. We demonstrate that integrative modeling using available information about demographics, disease severity, physiological parameters, and medical interventions can outperform previous prediction models, highlighting the importance of our integrative (MPPM model) approach.
Concerning peritonitis treatment, there has been ongoing discussion for years as to whether open abdomen treatment is justified, or whether a so-called "second look on demand" makes more sense. Although this is not the main focus of this analysis, the data presented here show a relatively low mortality rate compared to publications showing the results of second-look on-demand patients [5,31,32] . Cocollini from the "International Register of open abdomen" concluded that temporary abdominal closure remains reliable and safe as a treatment for severely injured and acute care surgery patients [33]. For peritonitis, the second major endpoint in case of an open abdomen is closure at the end of open treatment. In recent years, multiple working groups have put considerable effort into evaluating vacuum-assisted therapy as a treatment option for peritonitis in order to improve the closure rate [34,35] . The benefits of vacuum-assisted therapy are that the effort needed for repeated lavage treatments is minimized and the rate of patients with successful abdominal wall closure is higher. In our analysis the observed closure rate using open abdomen treatment was 87%, which is relatively high compared to studies using vacuum-assisted options [36][37][38] . Either therapy is futile, though, if the predicted outcome is bleak.
Outcome prediction is typically performed using single clinical and surgical markers and isolated scoring systems such as the SAPS II score. We showed that a combination of those dimensions outperforms predictions based on single indices. This was possible because the data analysis was based on a large single-center group of patients with peritonitis. The patients' individual factors were collected prospectively for the SAPS-II score, with further treatment-specific factors being added in a retrospective analysis.
The major advantage of this data analysis is its consistent cohort, with open abdomen treatment being performed uniformly over two decades. Intensive care strategies have also remained unchanged. As demonstrated in Fig. 1, there are only minor changes in the SAPS-II scoring evaluated over 20 years.

Limitations
Our work has several important limitations that deserve discussion. First, data were assessed retrospectively, with all inherent limitations driven by study design. While data were consistently documented in a timely manner following OR procedures, recall bias could theoretically apply. Importantly, the MPI was calculated from findings during surgery [27] . While this scoring system leaves room for interpretation and is logically limited in terms of power, the MPI was shown to be an accepted tool for mortality prediction [28] . It should be emphasized that only in-hospital mortality was analyzed.
Second, regardless of the power of our proposed prediction model, no single clinical scoring system should be a substitute for clinical decision-making. Nevertheless, although we regard it as a strength of our model that a multivariable approach deliberately includes important clinical variables, clinical decision-making should not solely be based on even such sophisticated models.
Using the SAPS-II score, the majority of vital signsincluding oxygenation, renal function and results from blood samples-were included at the time of admission to the ICU [15] . In addition, the data availability changed in the middle of the observation period, when additional physiological (MPI score) and surgical predictors (i.e., wound-healing disorder) became available. We thus note that the comparison of univariable prediction models as well as the comparison of domain-specific models (for example, the physiological prediction model versus the surgical prediction model) are not based on the same patients but rather on different subsamples of the entire cohort.

Conclusion
Currently, prediction modeling mainly relies on single-index (or combination-index) analyses. With the multidomain peritonitis prediction model we demonstrate that integrative modeling using available information such as demographics, disease severity, physiological parameters, and medical interventions outperforms previous models. In the case of a severely compromised patient with peritonitis, our model suggests that the predictive power is best when all predictive parameters from the performance status are combined. This could lead to more reliable outcome prediction, and reflects the great importance of the interdisciplinary combination of surgical, laboratory, and clinical expertise, leading to improved decision-making in experienced physicians.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.