Background

Colorectal cancer is the third most common cancer in the world and the second leading cause of cancer deaths, with rectal cancer (RC) accounting for one-third of the total [1, 2]. Despite the availability of various therapeutic options, surgical resection is still the primary treatment for early- to mid-stage and even partially advanced rectal cancer (RC) [3]. Laparoscopic total mesorectal excision (LaTME) is recognized as the surgical technique of choice for the treatment of rectal cancer, with a pathologic safety profile and overall survival rate no less than that of open surgery [4]. However, due to the complexity and high level of difficulty of the procedure, intraoperative conversion to open surgery and complications are relatively common. Although these conditions do not indicate surgical failure, they significantly affect the prognosis [4, 5]. Therefore, preoperative surgical evaluation is particularly important.

Multiple studies have been conducted to confirm the consequences of obesity on surgical difficulties, with most focusing on BMI [6,7,8]. While BMI is an estimation of generalized obesity, it does not accurately reflect the relationship with surgical difficulties. Therefore, the assessment of abdominal fat is deemed more significant compared to BMI [9]. The use of computed tomography (CT) to measure the area and mean radiodensity of abdominal visceral adipose tissue and subcutaneous adipose tissue has been validated in the majority of studies [10, 11], ensuring its reliability. The inclusion of the area and radiodensity of visceral adipose tissue and subcutaneous adipose tissue in this study allows for a more comprehensive assessment of obesity concerning surgical difficulty.

Machine learning (ML), a burgeoning form of artificial intelligence (AI), is increasingly being applied in healthcare data analysis. ML excels in connecting multiple variables and accurately predicting outcomes. Consequently, several ML prediction models are currently being adopted for disease diagnosis, prognosis prediction, and clinical decision-making [12, 13], and the objective of this study was to utilize machine learning with nomograms to establish a preoperative prediction model for the surgical difficulty of laparoscopic rectal cancer surgery, to identify preoperative independent predictors of surgical difficulty that consequence the difficulty of the surgery, and consequently, to help medical surgeons to develop personalized surgical options for their patients preoperatively.

Materials and methods

Patients

The Ethics Committee of the Second Affiliated Hospital of Soochow University approved our retrospective study. It conforms to the 1964 Helsinki Declaration of the World Medical Association and its subsequent revisions. This study included 186 patients who underwent laparoscopic total mesorectal excision (LaTME) from January 2018 to December 2020 in the Department of Gastrointestinal Surgery of the Second Affiliated Hospital of Soochow University.

The inclusion criteria were:

(1)Preoperative pathological examination that confirmed the diagnosis of rectal cancer; (2) complete computed tomography (CT) scan and clinical data within two weeks before surgery; (3) preoperative plan for laparoscopic total mesorectal excision (LaTME).

The exclusion criteria were :

(1) Emergency surgery; (2) open surgery; (3) preoperative adjuvant treatment such as radiotherapy or chemotherapy; (4) clinical stage 4 or huge tumor that was inoperable. Clinicopathologic parameters were retrospectively collected from the medical record database. The Clavien-Dindo classification [14] was utilized to classify postoperative short-term complications.

Data collection

For patients included in the study, the following parameters are retrospectively collected from our hospital’s electronic medical record system: (1) Basic patient characteristics: age, gender, BMI, American Society of Anesthesiologists (ASA) score, and comorbidities (hypertension, diabetes). (2) Laboratory results: albumin, hemoglobin, triglycerides, total cholesterol, C-reactive protein, and Systemic Inflammatory Grade (SIG) [15]within two weeks before surgery. (3) Intraoperative data: surgical time and the need for conversion. (4) Postoperative data: pathological results, hospitalization time, and postoperative complications classified according to the Clavien-Dindo classification.

CT abdominal adipose tissue measurements

Visceral adipose tissue (VAT) and subcutaneous adipose tissue (SAT) parameters are calculated and averaged using Slice-O-Matic software (version 5.0; TomoVision) at the level of the fourth and fifth lumbar vertebral interspaces by selecting two consecutive CT cross-Sect. (5 mm). Cross-sectional areas are delineated based on anatomical knowledge and tissue-specific Hounsfield unit (HU) ranges of -150 to -50 HU for VAT and − 190 to -30 HU for SAT [16]. For two CT scans of the same patient, regions of interest are outlined separately by two individuals trained in software usage and then averaged (Supplementary Fig. S1). If substantial differences exist between the two outlines, a third person reviews and verifies the measurements.

Surgical difficulty criteria

Based on the criteria established by Escal et al. [17] and in combination with the research by Y. Seki et al. [9], which demonstrates that patients with a visceral fat area/body surface area (VFA/BSA) ≥ 85 cm2/m2 present greater surgical challenges, refinements have been made to formulate the scoring criteria. The scoring system encompasses five factors related to surgical difficulty, each factor being assigned a weighting based on clinical experience. Scores range from 0 to 10 points and are divided into two categories: a cumulative score below 3 points indicates non-difficult surgery, while a score of 3 points or higher signifies surgical difficulty (Table 1). The five variables associated with surgical difficulty show significant differences between the surgical-difficulty group and the non-surgical difficulty group (P < 0.05) (Supplementary Table S1).

Table 1 Surgical difficulty grading

Establishment of machine learning models

We utilized R software (version 4.3.1) to set a fixed random seed of grouping and randomly divided all patients into two groups according to 7:3, training cohort (n = 131) and validation cohort (n = 55). To simplify the machine learning model and enhance its generalization capability, 10 cross-validated LASSO regressions are employed to reduce variable dimensionality. The filtering criterion is based on the lambda. min variable, with the most optimal model fit observed at lambda.min = 0.007424. Fifteen out of the twenty-one variables with the highest predictive power for surgical difficulties are selected (Fig. 1). It has also been demonstrated that age [18], and ASA [19] have an consequence on the performance of the procedure, and therefore they were included as model predictor variables.

Fig. 1
figure 1

(A) LASSO coefficient profile for the 21 variables. (B) Selection of the best penalty coefficient λ in the LASSO model, using 10-fold cross-validation based on the minimum criterion. The dashed line on the left represents lambda. min and the dashed line on the right represents lambda.1se

In this study, a total of four machine learning models were selected to predict the degree of surgical difficulty: support vector machine (SVC), random forest (RF), logistic regression (LR), and decision tree (DT). In the training cohort, we individually parameterized all models to tune to prevent overfitting or underfitting, and the same hyperparameters were used in the validation cohort to assess the predictive ability of the models. In the training cohort, we adjusted the parameters of all models to prevent overfitting or underfitting and used the same hyperparameters in the validation cohort to evaluate the predictive ability of the models.

Statistical analysis

Statistical analyses are conducted using the statistical R software (version 4.3.1) package. Continuous data that adhere to a normal distribution are presented as mean and standard deviation (SD), while non-normally distributed continuous data are displayed as median and interquartile range (IQR). Categorical data are represented as frequency and percentage (%). T-tests are utilized for the comparison of continuous variables, and χ2 tests are employed for the comparison of categorical variables. Predictive nomograms are developed. The variables selected for inclusion in the nomogram are determined through the backward stepwise method of Akaike’s information criterion, and factors with a multifactor logistic regression P < 0.05 are included. Calibration curves are created to evaluate the calibration of the nomogram. Furthermore, Harrell’s C-index is calculated, and bootstrap validation of the nomogram is performed (using 1000 bootstrap weight samples) to compute the C-index of relative calibration. Decision curve analysis (DCA) is also conducted. Statistical significance is defined as P < 0.05.

Results

Patient characteristics

A total of 186 patients were included in the final analysis, including 112 males (60.22%) and 74 females (39.78%). The median age was 66 years, the median BMI was 23.35 kg/m2, 76 had comorbid hypertension, and 21 had diabetes mellitus.101 had tumors > 3 cm in diameter, 88 had tumors to the dentate line ≥ 10 cm, the median visceral adipose tissue radiodensity was − 94.08 U, the median subcutaneous adipose tissue radiodensity was − 97.34 U, and the median subcutaneous adipose tissue area median was 127.55 cm2. 147 had preoperative serum albumin ≥ 35 g/L. There were no significant differences in clinical characteristics and CT parameters between the training and the validation cohorts (P > 0.05) (Table 2).

Table 2 Clinical characteristics and CT parameters in the training cohort and validation cohort

Construction of machine learning models

Four machine learning models - Support Vector Machine (SVC), Random Forest (RF), Logistic Regression (LR), and Decision Tree (DT) - were selected for this study. The optimal hyperparameters were calculated by repeating the cross-validation five times considering the accuracy and AUC. Subsequently, the optimal hyperparameters were derived by manual tuning. We also obtained the following parameters: sensitivity, specificity, precision, recall, and F1 (Table 3). Using AUC as the evaluation criterion, the optimal performance in the training cohort: SVM AUC = 0.995 (0.988-1.000), the other models are LR AUC = 0.994 (0.987-1.000), DT AUC = 0.970 (0.943–0.996) and RF AUC = 0.963 (0.935–0.999). The best performance in the validation cohort: SVM AUC = 0.987 (0.962-1.000) and the other models are RF AUC = 0.953 (0.901-1.000), LR AUC = 0.950 (0.889-1.000) and DT AUC = 0.904 (0.805-1.000)( Fig. 2). The De-Long test is employed to compare the predictive efficacy of the four machine learning models, the results indicated no significant differences among the models (P > 0.05), all of which demonstrated superior predictive performance (Supplementary Table S2).

Fig. 2
figure 2

Evaluation of the receiver operating characteristic curve (ROC) performance for four machine learning models based on the area under the receiver operating characteristic curve (AUC) in training (A) and validation (B) cohorts

Table 3 Performance of four machine learning models in training and validation cohorts

Results of univariate and multivariate logistic regression analysis

After univariate and multivariate logistic regression, it was identified that BMI (OR:1.52, 95% CI: 1.10–2.11), SAT area (OR:1.02, 95% CI: 1.01–1.04), VAT radiodensity (OR:1.34, 95% CI: 1.16–1.56), the distance between the tumor and the dentate line < 10 cm (OR:0.03, 95% CI: 0.01–0.21), tumor diameter > 3 cm (OR:0.14, 95% CI: 0.03–0.82), and comorbid hypertension (OR:0.19, 95% CI: 0.04–0.83) were independent risk factors for surgical difficulties( Table 4).

Table 4 Univariate and multivariate logistic regression analysis between the training and validation cohorts

Development and validation of nomogram

To make the evaluation more intuitive, we created nomogram based on logistic regression, as logistic regression makes it easier to interpret nomogram. In the nomogram model, variable selection was based on a backward stepwise screening using the Akaike information criterion requiring P < 0.05: BMI, tumor distance from the dentate line, tumor diameter, VAT radiodensity, SAT area, and comorbid hypertension were included in the variables used to construct the nomogram(Fig. 3). The C- indices are all greater than 0.9, indicating that the predictive ability of the model has a high degree of confidence(Fig. 4a). Furthermore, the Decision Curve Analysis (DCA) showed significantly better net benefit in the predictive model(Fig. 4b). The calibration curves for this nomogram demonstrated favorable concordance (Bootstrap = 1000 repetitions, mean absolute error (training cohort) = 0.042, mean absolute error (validation cohort) = 0.039) (Fig. 5), and we also performed the Hosmer and Lemeshow tests, which indicated that both the training cohort and validation cohort indicated a good fit (training cohort P = 0.853; validation cohort P = 0.400).

Fig. 3
figure 3

Predictive modeling of surgical difficulty nomograms

Fig. 4
figure 4

(a) Receiver Operating Characteristic curves (ROC) were utilized to predict surgical difficulty for both the training and validation cohorts. The training cohort is indicated by the solid black line, and the validation cohort is indicated by the solid red line. (b) The clinical utility is evaluated by performing a Decision Curve Analysis (DCA) analysis. The y-axis represents the net benefit, while the x-axis represents the threshold probability. The training cohort is indicated by the solid blue line, and the validation cohort is indicated by the solid red line

Fig. 5
figure 5

Calibration curves for the model provide a predictive risk assessment of surgical difficulty for both the training cohort (a) and the validation cohort (b). The solid line in the figure indicates the performance of the predictive model, with closer proximity to the diagonal dashed line indicating more accurate predictions. The calibration curves for the training cohort and validation cohort are in high agreement with the fitted line, indicating the high accuracy of the nomogram

Discussion

In this study, four machine learning models were developed and validated for predicting the difficulty of LaTME. Based on the comparison of the ML models, all four ML models showed high performance, with the AUC of the SVM standing out in the training and validation groups, but not significantly different from the other models. Meanwhile, to further visualize the assessment of surgical difficulty, we also constructed a logistic regression-based nomogram, according to which clinical surgeons can use the nomogram to calculate the risk probability of surgical difficulty, to make adequate preoperative and intraoperative preparations.

LaTME is perhaps the most challenging type of surgery in colorectal surgery. Appropriate pelvic debridement and total mesorectal excision (TME) are essential to prevent local recurrence [20]. Previous studies have focused on the consequence of pelvic factors and rectal mesenteric fat area on the surgical outcome of lower and middle rectal cancer [21, 22], the consequence of a large vertical pelvic depth, a small pelvis, a short transverse meridian, a large sacrococcygeal curvature and a high rectal mesenteric fat area on the difficulty of the procedure was also determined [17, 23, 24], and these factors are particularly significant for men [25]. However, the relationship between quantitative pelvic measurements and surgical difficulty is uncertain, and some studies have even found no association between pelvic measurements and surgical difficulty [26]. Our study demonstrated that abdominal adipose tissue is an independent consequence of the difficulty of laparoscopic rectal cancer surgery. The assessment of abdominal visceral fat is important because it can result in the intraoperative separation of the visceral layer from the abdominal fascia, the exposure of the rectal vessels, and the smoke when applying the ultrasonic knife [27]. Therefore, we included VAT in the surgical difficulty assessment score to further refine the preoperative assessment of surgical difficulty. We found that there was a significant difference in the radiodensity of visceral adipose tissue between the two groups of patients, and according to one study, the radiodensity of fat was also associated with overall survival and mortality in colorectal cancer, and it was also demonstrated that this phenomenon may be due to: inflammation, browning of the adipose tissue, and edematous disorders [16, 28], and therefore, we hypothesized that the consequence of the radiodensity of fat on surgery may also be related to this, which, of course, requires further exploration and discussion.

BMI is considered to be the most common indicator describing overall obesity [29, 30], and previous studies have shown that high BMI has a significant consequence on postoperative outcomes after rectal surgery [4, 31, 32], but in recent years an “obesity paradox” has emerged [33,34,35]. A meta-study showed that obese patients (including class I/II/II) had a lower mortality rate within 30 days than patients with normal BMI, but a higher mortality rate after 30 days than patients with normal BMI [33], which reflects the limitations of BMI and the value of our study.

Our study also indicated that hypertensive patients are more difficult to operate, and a study has proved that the prevalence of hypertension is higher in patients with abdominal visceral obesity [36], therefore, hypertensive patients may still be due to abdominal visceral obesity, of course, we didn’t do more research and have no direct evidence to prove this, so I believe that we will have a report on this in the future.

Escal et al. [17]and other studies [37] included blood loss in the scoring criteria, and we considered that in some surgeries, which are interfered with by factors such as abdominal lavage, the measurement of blood loss may not be completely accurate, which may result in the grouping of surgical difficulty in some patients. Therefore, to formulate the prediction model more objectively, to make the model more persuasive, and to facilitate the generalization of the model, we did not include blood loss in the scoring factors, and we do not deny that intraoperative blood loss may reflect the difficulty of surgery to a certain extent. In the future, we will also include standardized measurements of blood loss in the scoring criteria.

In addition, robotic total rectal mesentery resection is now widely used, and several studies have shown that robotic total rectal mesentery resection is non-inferior to laparoscopic total rectal mesentery resection in terms of both short-term outcomes of postoperative complications and overall survival [25, 38,39,40,41], and even superior to laparoscopic total rectal mesentery resection. The rigidity of laparoscopic instruments and the limitation of operating space may affect specimen quality, and the robotic device may overcome the above disadvantages of laparoscopic instruments, resulting in better specimen quality and reduced local recurrence [42]. Interestingly, VFA with rectal mesenteric adipose tissue did not have significant clinical significance for the postoperative pathological safety of robotic total rectal mesentery resection [43, 44], which needs to be validated by a large sample from a multicentric population.

Laparoscopic and open surgery are still the dominant procedures for the treatment of rectal cancer [45], and therefore this study remains essential. In addition to the factors we investigated that consequence in the difficulty of rectal surgery, several studies have demonstrated that a history of previous abdominal surgery, preoperative radiotherapy, surgeon’s proficiency, preoperative patient’s nutritional status, and other factors can influence the difficulty of laparoscopic surgery [46, 47]. Although various factors consequence the difficulty of surgery, we believe that the scoring criteria demonstrated by Escal et al. [17] are more objective, and we believe that more factors will be included in the scoring of surgical difficulty in the future, thus further improving the objectivity and persuasiveness of the scoring, and also providing ideas for the evaluation of other surgeries.

Of course, this study has a few limitations. First, this is a retrospective study with a small number of enrolled patients, so selection bias cannot be completely ruled out and a larger sample size from more centers is needed for further validation. Second, laboratory tests, clinicopathologic features, and abdominal CT parameters were included in this study, but since CT only measures the mean area of two planes of the abdomen and does not measure the volume of abdominal fat, errors may arise as a result; finally, we only selected some of the clinical biochemical indexes, and factors that were not included may lead to residual confounders.

Conclusions

This study developed four ML models for evaluating surgical difficulty, all of which indicated excellent efficacy, and to further visualize the evaluation, logistic regression-based nomograms are created. Both the training cohort and validation cohort confirmed the excellent performance of the models, providing clinicians with easy-to-use tools to help them make accurate surgical decisions. Of course, further validation through multi-center and large sample sizes is needed to ensure the prediction effect.