Development and validation of a preoperative “difficulty score” for laparoscopic transabdominal adrenalectomy: a multicenter retrospective study

Background A difficulty score for laparoscopic adrenalectomy (LA) is lacking in the literature. A retrospective cohort study was designed to develop a preoperative “difficulty score” for LA. Methods A multicenter study was conducted involving four Italian tertiary centers for adrenal disease. The population was randomly divided into two subsets: training group and validation one. A multicenter study was undertaken, including 964 patients. Patient, adrenal lesion, surgeon’s characteristics, and the type of procedure were studied as potential predictors of target events. The operative time (pOT), conversion rate (cLA), or both were used as indicators of the difficulty in three multivariate models. All models were developed in a training cohort (70% of the sample) and validated using 30% of patients. For all models, the ability to predict complicated postoperative course was reported describing the area under the curve (AUCs). Logistic regression, reporting odds ratio (OR) with p-value, was used. Results In model A, gender (OR 2.04, p = 0.001), BMI (OR 1.07, p = 0.002), previous surgery (OR 1.29, p = 0.048), site (OR 21.8, p < 0.001) and size of the lesion (OR 1.16, p = 0.002), cumulative sum of procedures (OR 0.99, p < 0.001), extended (OR 26.72, p < 0.001) or associated procedures (OR 4.32, p = 0.015) increased the pOT. In model B, ASA (OR 2.86, p = 0.001), lesion size (OR 1.20, p = 0.005), and extended resection (OR 8.85, p = 0.007) increased the cLA risk. Model C had similar results to model A. All scores obtained predicted the target events in validation cohort (OR 1.99, p < 0.001; OR 1.37, p = 0.007; OR 1.70, p < 0.001, score A, B, and C, respectively). The AUCs in predicting complications were 0.740, 0.686, and 0.763 for model A, B, and C, respectively. Conclusion A difficulty score based on both pOT and cLA (Model C) was developed using 70% of the sample. The score was validated using a second cohort. Finally, the score was tested, and its results are able to predict a complicated postoperative course. Supplementary Information The online version contains supplementary material available at 10.1007/s00464-021-08678-6.

have been demonstrated, remarking a tutor's role during the training period [17]. However, a "difficulty score" for LA is lacking in the literature. The aim of this study was to develop a preoperative "difficulty score," analyzing a large series of patients who underwent LA in high-volume tertiary centers.

Methods
A multicenter retrospective observational study was undertaken at the Departments of General Surgery of Bologna (Alma Mater Studiorum-Policlinico S. Orsola-Malpighi), Brescia (Università di Brescia-ASST Spedali Civili), Ancona (Università Politecnica delle Marche), and Roma (Università La Sapienza). All of them are referral centers for adrenal surgery in Italy and have prospectively maintained databases. For each case, the indication for surgical treatment was approved by a multi-disciplinary team, including surgeons, endocrinologists, radiologists, and pathologists dedicated to adrenal diseases. The anesthesiologist evaluation completed the preoperative risk stratification. Data were extrapolated from prospectively collected databases and managed according to Institutional rules, with the patient's consent. All patients undergoing LA from January 1994 to September 2020 were included in the study. Patients who underwent adrenalectomy with an open approach, exploratory laparoscopy, or surgery for recurrence of disease after LA were a priori excluded. The authors screened 976 records, and 12 patients were excluded for incomplete data. In the final analysis, the authors included 964 patients. For each record, the following perioperative data were extracted: characteristics of the patient (gender, age, body mass index, ASA score, comorbidities, previous abdominal surgery, presence of symptoms); characteristics of the adrenal lesion (side, size, presumptive diagnosis of functioning or non-functioning and benign or malignant lesion based on clinical-radiological data); characteristics of the surgeon (cumulative sum of procedures performed, distinction in a junior or senior surgeon [17]); characteristics of the planned procedure (need for extended resection or other surgical procedures); intraoperative data (operative time, laparoscopic approach with or without need for conversion, blood loss); postoperative data such as complications according to Clavien-Dindo classification [18], (CDC) resumption of enteral feeding, need for intensive care in ICU, length of ICU stay, length of hospitalization, 90 days mortality, and histological diagnosis. It should be noted that the cumulative sum of procedures (CUSUM) was described for each surgeon as a progressive ordinal number. Thus, CUSUM reflects the experience of each operator at the time of each procedure.

Statistical analysis
All categorical variables were reported as frequencies and percentages, whereas continuous variables were reported as the median and interquartile range (IQR). An operative time above the 75th percentile (pOT) or conversion to open surgery (cLA) was considered indicative of difficulty. A complicated postoperative course > II CDC class was used to test the utility of difficulty scores. Three predictive models were built: (1) model A, in which all preoperative factors predicting pOT were studied; (2) model B, in which all preoperative factors predicting cLA were studied; (3) model C, in which all preoperative factors predicting both the events were studied. The analysis was carried out in three steps. Firstly, preoperative variables were pre-selected using the least absolute shrinkage and selection operator (LASSO) method [19]. For the subsequent two steps, the cohort was divided into a training (including 70% of patients) and a validation cohort (including the remaining 30%). Patients were casually distributed, independently from the center and the date of surgery, in the two subsets by a random number generator to avoid any time-depending bias. Secondly, all models were analyzed in a training cohort (70% of patients). All models were graphically represented by a nomogram [20] and were converted into a score. A validation was obtained using the remaining 30% of patients (validation cohort) in the third step. Calibration was made using the post-regression estimation of the marginal values.
For each score, the diagnostic accuracy (AUC) was described and interpreted as follows: excellent > 0.9, good between 0.8-0.9, fair between 0.7 and 0.8, poor between 0.6 and 0.7, and failed < 0.6. The three models' utility was tested to predict a severe complicated postoperative. All analysis was made using logistic regression reporting odds ratio (OR) and standard error (SE). STATA 14 software (StataCorp.) 2011 was used to carry out all analyses. All details were exhaustively reported in the Supplementary methods.

Results
The entire cohort included 964 patients undergoing laparoscopic adrenalectomy with a transperitoneal approach. The breakdown by centers was the following: 51.7% of patients (498) from Ancona, 26.1% (252) from Bologna, 12.9% (124) from Brescia, and 9.3% (90) from Rome. One senior surgeon (AMP) participated in the University of Ancona case series from 1994 to 2002 and in the University of Rome case series from 2002 to 2020. Preoperative data are described in Supplementary Table 1.
The postoperative results are summarized in Supplementary Table 2.

First step: preselection of the covariates
The covariates potentially predicting model A (pOT) were age, gender, presence of symptoms, clinical-radiological diagnosis, side of the lesion, BMI, ASA score, previous surgery, lesion size, type of surgeon, CUSUM, and associated or extended surgery. The optimal Lambda value was 0.037. The selection process is shown in Supplementary  Fig. 1, panel A. The covariates potentially predicting model B (cLA) were the presence of symptoms, clinical-radiological diagnosis, side of the lesion, BMI, ASA score, previous surgery, lesion size, type of surgeon, and associated or extended surgery. The optimal Lambda value was 6.113. The selection process is shown in Supplementary Fig. 1, panel B.
The covariates potentially predicting model C (pOT or cLA) were the same of model A. The optimal Lambda value was 0.034. The selection process is shown in Supplementary  Fig. 1, panel C.

Second step: analysis on training cohort
The multivariate analysis on a cohort of 679 patients (70% of the total) is reported in Tables 1, 2, and 3 for models A, B, and C, respectively.
In  Fig. 3. Score C ranges from a minimum of 0 to a maximum of 33.5 points. The AUCs were 0.833 ± 0.016, 0.710 ± 0.048, and 0.809 ± 0.019 fo Score A, B, and C, respectively (Supplementary Fig. 2 panel A, B, and C).

Third step: analysis of the validation and test cohort
Scores A, B, and C were validated on a cohort of 285 patients (30% of the total). Score A has been proven to significantly predict an operative time extension's increased risk beyond 140 min (OR 1.99 ± 0.19 for each point, p < 0.001). Supplementary Fig. 3 panel A shows the curve's trend representing Score A. Score B has been proven to significantly predict the increased risk of conversion (OR 1.37 ± 0.16 for each point, p = 0.007). Supplementary Fig. 3 panel B shows the curve trend representing Score B. Score C has been proven to significantly predict the increased risk of an operative time extension beyond 140 min or conversion (OR 1.70 ± 0.13 for each point, p < 0.001). Supplementary Fig. 3 panel C shows the trend of the curve representing Score C. The AUCs of the  three models were 0.820 ± 0.015 for score C, 0.819 ± 0.015 for score A, and 0.6333 ± 0.0208 for score B ( Supplementary  Fig. 4 panel A, B, and C). All three models (A, B, and C) were significantly related (p < 0.001) to a complicated postoperative course defined as CDC class > II: OR 1.29 ± 0.81, 1.72 ± 0.23, and 1.25 ± 0.06 for Score A, B, and C, respectively. The AUCs values in predicting severe postoperative complications were 0.740 ± 0.071, 0.686 ± 0.069, and 0.763 ± 0.415 for Model A, B, and C, respectively (Fig. 4).

Discussion
The present study demonstrated that some preoperative parameters are useful to predict the difficulty of LA. In this study, 964 laparoscopic adrenalectomies performed in four high-volume centers are described. To our knowledge, this is one of the largest cohorts in which the difficulty of LA was evaluated. Similar to other experiences [21][22][23], the difficulty was measured using the operative time and conversion rate. The severe postoperative complication rate was used to confirm the utility of the scores. The perioperative transfusion rate was not used as an indicator of difficulty because transfusions were relatively rare events (3.5%) related to the conversion rate. Thus, a model based on transfusion rate could overlap the model based on conversion rate without a gain in statistical power. Three separate models were developed: one for the operative time (A), one for the conversion (B), and one for both (C). Each model was studied in a training cohort (70% of the sample) and confirmed in a validation cohort (30% of the sample). The results observed in the training cohort were those expected based on literature data both for operative time [14][15][16][17][23][24][25][26][27][28] and conversion rate [20][21][22]. Indeed, male gender, high BMI, previous abdominal surgery, bilaterality of lesion, size of the lesion [29][30][31][32], associated surgical procedures, and need for extended resections prolonged the operative time, increasing the probability of overcoming the 75th percentile. The surgeon's experience, on the contrary, reduced the probability of a prolonged operative time. The lesion's size, the ASA score, and the need for extended resection increased conversion probability. All three models are clinically plausible, easily computable using a simple nomogram (Figs. 1, 2, 3), and provide a numerical score related to the target events' probability.
Nevertheless, models A and C have good accuracy, whereas B has fair accuracy. The models were validated and calibrated using the second cohort of patients, confirming the results' statistical plausibility. In the validation cohort, the AUCs of models A and C were confirmed to be good, correctly classifying eight patients every ten tested. On the contrary, model B was not so accurate, correctly classifying only six patients every ten. Therefore, the most useful model to predict a difficult LA seems to be model C because it demonstrated a high AUC and, at the same time, the ability to predict both adverse events (conversion or prolonged operative time). This finding did not surprise: a model based only on the conversion rate could not include all "difficult" procedures. In other words, not all challenging procedures were converted even if performed by a skilled laparoscopic surgeon. A second interesting result was that both A and C models take into consideration many types of factors. Indeed, the scores included both patient and disease characteristics, not forgetting the type of procedures planned and the surgeon's experience. The score could practically help the chief surgeon plan the procedures and proper patient counseling. The models' utility was further underlined by the correlation between A and C scores with the probability of a complicated postoperative course.
This study has some limitations. First, the design of the study was retrospective. However, all databases come from high-volume centers and are prospectively maintained. Moreover, all postoperative data suggested the high quality of surgery with very low conversion and postoperative complication rates. A second limitation was the large enrollment period and the relative changes in surgeon training and medical devices over the last 20 years. According to the period in which they were trained, the bias was partially mitigated and studied by dividing the surgeons into first-and second-generation surgeons [17].
Moreover, all time-depending bias, such as different distribution of significant factors, was overcome by the study design. Indeed, the entire cohort was randomly divided, independently from the center and date of surgery, in a training and validation cohort. A third limitation was the low number of target events (converted or prolonged procedures) used to build the scores. Indeed, the low number of events could affect the robustness of multivariate models.
Nevertheless, the LASSO approach's use solved the overfitting, reducing, when necessary, the number of covariates for the multivariates analysis. A further limitation was the applicability of the difficulty score only in transperitoneal approaches. All involved centers performed LA using the anterior or lateral transperitoneal approach, and for this reason, the model was not validated for the retroperitoneal approach.
In conclusion, we reported, validated, and tested a difficulty score for laparoscopic adrenalectomy for the first time. The obtained models, particularly model C, could predict two critical events: conversion to open surgery and prolonged operative time. The score for each model corresponds to the probability that the target event may occur. It is simple to calculate preoperatively, practical to use, and could be used not only for the surgical team's choice but also to predict and avoid a complicated postoperative course. An external validation would be recommended to confirm these results further.