Abstract
Data analytics and artificial intelligence (AI) have been used to predict patient outcomes after colorectal cancer surgery. A prospectively maintained colorectal cancer database was used, covering 4336 patients who underwent colorectal cancer surgery between 2003 and 2019. The 47 patient parameters included demographics, peri- and post-operative outcomes, surgical approaches, complications, and mortality. Data analytics were used to compare the importance of each variable and AI prediction models were built for length of stay (LOS), readmission, and mortality. Accuracies of at least 80% have been achieved. The significant predictors of LOS were age, ASA grade, operative time, presence or absence of a stoma, robotic or laparoscopic approach to surgery, and complications. The model with support vector regression (SVR) algorithms predicted the LOS with an accuracy of 83% and mean absolute error (MAE) of 9.69 days. The significant predictors of readmission were age, laparoscopic procedure, stoma performed, preoperative nodal (N) stage, operation time, operation mode, previous surgery type, LOS, and the specific procedure. A BI-LSTM model predicted readmission with 87.5% accuracy, 84% sensitivity, and 90% specificity. The significant predictors of mortality were age, ASA grade, BMI, the formation of a stoma, preoperative TNM staging, neoadjuvant chemotherapy, curative resection, and LOS. Classification predictive modelling predicted three different colorectal cancer mortality measures (overall mortality, and 31- and 91-days mortality) with 80–96% accuracy, 84–93% sensitivity, and 75–100% specificity. A model using all variables performed only slightly better than one that used just the most significant ones.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Colorectal cancer is the third most common cancer by incidence, with over 1.8 million new cases in 2018, and the second most common cause of cancer death when the sexes are combined [1]. Around 147,950 and 42,300 new cases of colorectal cancer were predicted for the USA and UK respectively in 2020 [2, 3]. Moreover, it is estimated that there will be around 2.4 million new cases worldwide in 2035 [4]. The reasons for this growth in cases are unclear and are the subject of clinical and basic research [5].
This article considers a range of factors affecting the patient outcomes after surgery, covering both the patient’s individual characteristics and the nature of the surgery. The patient characteristics include performance status (ASA grade) and BMI, reflecting prior work that shows the effects of obesity on a range of conditions, including cancers [6].
LOS, readmission and mortality are essential proxies of quality of care in surgery [7,8,9,10]. Shorter LOS could potentially minimise healthcare costs, free up hospital beds, improve productivity, reduce the risk of nosocomial infections and improve quality of life. Increased readmission rates have a huge impact on healthcare costs. Readmission within 30 days annually costs around 40 billion [11, 12]. Moreover, a higher readmission rate indicates poor discharge planning and post-operative morbidity, with a clinical and psychological impact on the patient. Overall mortality rates following colorectal surgery range from 1 to 16.4% [13,14,15]. The National Cancer Intelligence Network found that the 30-day post-operative mortality rate is falling across England, with the overall post-operative mortality rate of 6.7%. This rate improved over the study period from 6.9% in 1998 to 5.9% in 2006 [16]. The National Bowel Cancer Audit in their recent annual report found a downward trend in 90-day post-operative mortality with a rate of 3.0%. The study also showed that 90-day post-operative mortality has reduced from 2.3% in 2013/14 to 1.7% in 2017/18 for elective surgery and from 14.2 to 11.5% for emergency surgery [17].
The economic impact of colorectal cancer on healthcare systems is intense. The US is expected to spend around US$17.41 billion on colorectal cancer, and approximately US$4.2 billion in productivity lost to deaths related to colorectal cancer in 2020 [18]. In the UK, the cost of diagnosing and treating colorectal cancer patients is significant (€40,000 per case) [19] and exacerbates funding constraints on the National Health Service (NHS) [8]. With limited resources and a finite surgical bed capacity in many hospitals, it is extremely important to know the expected LoS, readmission rate and mortality after elective CRC surgery.
In recent times, Artificial Intelligence (AI) and Machine Learning (ML) techniques have shown great promise in the diagnosis and prognosis of various diseases and health conditions [20, 21]. ML aims to discover patterns from data without explicit programming. ML algorithms are used to model and learn important properties from data, including the stochastic dependency between a set of input and output variables. ML is a data-driven technique that has the benefit of integrating multiple risk factors into a prediction model [22]. Meanwhile, ML techniques have been found useful in detecting colorectal cancer in advance where the model was constructed with blood cell count, age and sex as input features [23].
An accurate prediction of LOS, readmission and mortality would help healthcare professionals with planning, decision making and building strategies. This will eventually lead to improved patient care and prevent readmission and mortality after discharge [24]. A prediction model that could predict readmission accurately would help healthcare professionals to intervene in readmission scenarios and provide better patient care. A model with the ability to predict the mortality would be valuable to patients, surgeons and healthcare institutions. An accurate mortality prediction model would contribute to patient risk stratification, preoperative consultation with the patient and family members, decision-making process, consent and professional accountability.
This study has investigated the scope of AI and data analytics in predicting LOS, readmission and mortality in colorectal cancer patient’s treated in a large NHS trust. Predictor variables of LOS, readmission and mortality have been explored using data analysis to determine which predictors are most important. Machine learning algorithms were then investigated as predictive tools. A comparison has been made between using all variables as predictors for machine learning versus using just the most significant variables.
2 Methodology
Codes and findings of feature selection experiments are available in the github repository [25]. The uploaded file in the repository is currently protected with a password, ‘colorectalai’.
2.1 Data
Records of a prospectively maintained colorectal cancer database by the Colorectal Department in a large NHS Trust were examined. The dataset contains 4336 patients who underwent colorectal cancer surgery between 2003 and 2019. The 47 patient parameters/variables included demographics, peri- and post-operative outcomes, surgical approaches, complications and mortality (see Table 1). Table 2 shows various classes associated with variables in the dataset. The table shows the four different classes of procedure, 17 different classes of specific procedure, two classes of laparoscopic type, surgical approach (open or laparoscopic), robotic (yes/no), operation mode (elective or emergency), and complications (yes/no). Both intra- and post-operative complications are considered together.
Descriptive statistics of some variables that summarise the central tendency, dispersion and shape of the distribution of a dataset, excluding NaN values, can be seen in Table 3. There were 2494 male and 1942 female patients in the dataset. Among these 4336 cases, 74% (3209) were curative, 13.35% (579) were palliative, and 6.53% (283) were uncertain. 80% (3475) of the surgeries performed were elective in comparison to 18% (782) of emergency. Assistance from the robot was considered for 8.9% (388) of the cases. The laparoscopic approach was applied to 57.45% (2491) of the cases, whereas open surgery was used in 35.79% (1552). The 30-day readmission rate was 7.4%. Moreover, the 30- and 90-day mortality was 3.39% (147) and 5.93% (257) respectively.
2.1.1 Data processing
The problems of missing values and mixed data types were dealt with using appropriate techniques and with the help of medical domain knowledge through discussions with clinicians. Following clinical discussion, missing values were filled with different techniques (see Table 4). The dataset consists of some columns where data types are mixed. For example, Sex and TumICD10 variables have both text and numeric data. In order to fit them to machine learning algorithms, all these mixed data types are converted to numeric data using the Pandas Series.str.replace() method [26]. Moreover, there are some columns that comprise text values only (e.g., Robotic, Radiotherapy). All these columns that consist of text data are passed through the LabelEncoder methods of scikit learn [27] to convert them to numeric data.
2.2 Prediction model building
2.2.1 Model for LOS
Regression predictive modelling was performed to predict the LOS. The data were power-transformed to make them more Gaussian-like [28]. Then the data were discretized to map numerical variables onto discrete values. Such mapping creates a high-order ranking of values that can smooth out the relationships between observations and is found useful for machine learning [29]. A 10-fold cross-validation technique [30, 31] was used for splitting the training and test data. Different algorithms were compared to find the optimal model for LOS prediction. A negative mean absolute error was used as the evaluation metric to compare different algorithms. Comparison between algorithms shows that support vector regression (SVR) outperformed the other algorithms (see Fig. 1). Following this finding, different parameters of the SVR algorithms (see Table 5) were tuned using the GridSearchCV technique [32]. The model was trained with the training dataset and tested on the test dataset. Finally, different evaluation metrics, namely root-mean-square error (RMSE), mean absolute error (MAE), and accuracy, were used to evaluate the model. Data analysis of different variables in predicting LOS was also conducted.
2.2.2 Model for readmission
Classification predictive modelling was performed to predict the readmission. Models with Random Forest (RF), K Nearest Neighbor (KNN), Support Vector Machine (SVM), Multilayer Perceptron (MLP), and Bidirectional Long Short-Term Memory (BI-LSTM) algorithms were compared for readmission prediction. For the BI-LSTM algorithm, the data are reshaped following the work of Masum et al. [33, 34] as the LSTM based RNN requires input to be in a matrix with the dimensions: [samples, time steps, features]. The model with BI-LSTM algorithm has been designed so that the network structure consists of three hidden layers with 100 LSTM units, then an output layer with the sigmoid activation. The network also represented binary crossentropy as a loss function, ADAM algorithm [35] as an optimizer, and accuracy as metrics. The network has been fitted with 20 epochs and a batch size of 2. 80% of data of the dataset was used for training and 20% was used for testing purpose.
2.2.3 Model for mortality
Classification predictive modelling was performed to predict the mortality. Models with Random Forest (RF), K Nearest Neighbor (KNN), Support Vector Machine (SVM), Multilayer Perceptron (MLP), and Bidirectional Long Short-Term Memory (BI-LSTM) algorithms were compared for mortality, and 31- and 91-days mortality prediction. The model structure for readmission prediction mentioned in Sect. 2.2.2 was used for mortality prediction scenarios.
2.3 Comparing variables
The variable that needs to be predicted is known as the target variable and the variables that are used to predict the target variable known as features. Identifying the best features is an important task [36]. A large number of features could lead to complex model, long training time, the curse of dimensionality, noise addition, overfitting etc. On the other hand, a smaller number of variables could lead to the exclusion of relevant variables. ExtraTreeRegressor [37], ExtraTreesClassifier [37], LassoCV [38] and Correlation Matrix analysis with Heat Map of scikit-learn [27] have been considered for feature selection. Moreover, in all prediction cases, 80% of the dataset was used for training and 20% was used for testing purposes.
2.3.1 Feature selection for LOS
Extra Tree Regressor showed that Age, BMI, Surgical approach, Operation time, ASA, Blood loss, Preoperative T stage, Stoma formation, Sex and Preoperative nodal stage were the most crucial features in predicting LOS (see github repository [25]). In contrast, a LASSO algorithm showed that Surgical approach, Sex, Chemotherapy, ASA, Operation mode, Stoma formation, TumID10, Procedure type, Additional procedures, Radiotherapy, Preoperative T stage, Age and, Cancer site were the most important features (see github repository [25]). Moreover, the features explored through a correlation matrix with heat map has found Surgical approach, ASA, Age, Operation mode, Complication, Stoma formation, Chemotherapy, TumID10, Preoperative T stage as essential features (see github repository [25]). Following the findings from these techniques, we considered Age, ASA, Surgical approach, Stoma formation, Preoperative T stage, Chemotherapy, Operation mode, TumID10, Cancer site, and Radiotherapy as the selected features for predicting LOS.
2.3.2 Feature selection for readmission
Extra Tree Classifier showed that Surgical approach, Operation time, LOS, BMI, Age, ASA, Blood loss, Preoperative T stage, Stoma formation were the most crucial features in predicting readmission (see github repository [25]). In contrast, a LASSO algorithm showed that Surgical approach, Operation mode, Previous surgery type, Stoma formation, Preoperative nodal stage, the Specific procedure, ASA, BMI, Age, Sex, LOS and Cancer site were the most important features (see github repository [25]). Moreover, features explored through a correlation matrix with heat map have found Surgical approach, Age, Preoperative nodal stage, Pre abdominal surgery, Previous surgery type, Operation mode, Complication, Additional procedures, Robotic, Curative Surgery, Operation time and LOS as essential features (see github repository [25]). Following the findings from these techniques, we considered Age, Surgical approach, Stoma formation, Preoperative nodal stage, Operation time, Operation mode, Previous surgery type, LOS, and the Specific procedure as the selected features for predicting readmission.
2.3.3 Feature selection for mortality
Extra Tree Classifier showed that Age, LOS, Curative Surgery, BMI, ASA, Operation time, Surgical approach, Chemotherapy, Radiotherapy, Operation mode were the most critical features in predicting mortality (see github repository [25]). A LASSO algorithm showed that Curative Surgery, Chemotherapy, Preoperative T stage, Operation mode, ASA, Preoperative M stage, Stoma formation, Age, the Specific procedure, LOS, OPCS4, BMI, Cancer site, Previous surgery type, and Surgical approach were the most important features (see github repository [25]). Moreover, features explored through a correlation matrix with heat map has found ASA, Age, BMI, Preoperative T stage, Preoperative M stage, Operation mode, Procedure type, Complication, Stoma formation, Radiotherapy, Chemotherapy, Surgical approach, Curative Surgery and LOS as essential features (see github repository [25]). Following the findings from these techniques, we considered Age, ASA, BMI, Chemotherapy, Preoperative M stage, Surgical approach, LOS, Curative Surgery, Preoperative T stage and Stoma formation as the selected features for predicting mortality. For 31 days mortality prediction, we selected Chemotherapy, Additional procedures, Operation mode, Complication, Previous abdominal surgery, Previous surgery type, Surgical approach, Curative Surgery, Age and Resection as selected features. In contrast, for 91 days mortality prediction we selected LOS, Additional procedures, Curative Surgery, Complication, Previous surgery type, Pre abdominal surgery, Procedure type, Operation mode, Surgical approach and Resection as the selected features.
2.3.4 Model for comparing variables in LOS, readmission and mortality prediction
The model with the SVR algorithm mentioned in Sect. 2.2.1 was used in comparing variables in LOS prediction. To compare variables in readmission prediction, the model with BI-LSTM algorithm mentioned in Sect. 2.2.2 was used. Moreover, the model with BI-LSTM algorithm used in Sect. 2.2.3 was used in comparing variables in all scenarios of mortality prediction.
3 Results
3.1 LOS prediction
The model with a SVR algorithm predicted the LOS with an MAE and RMSE of 9.69 and 12.52 respectively. Figure 2 shows how the model predicted the LOS. The regression outcome of the model is then converted to binary class using these conditions:
-
True class (1) = if patient’s stay is correctly predicted ≤ 6 days
-
True class (1) = if patient’s stay is correctly predicted > 6 days
-
False class (0) = if patient’s stay is incorrectly predicted ≤ 6 days
-
False class (0) = if patient’s stay is incorrectly predicted > 6 days
Converting the regression outcome to a binary class helps to calculate the accuracy of the model and accuracy of 83.21% was recorded when all available variables were considered as inputs.
3.2 Data analysis of LOS
Data analysis of different variables for LOS shows that age groups, ASA grade, whether a robotic surgery was performed or not, whether a laparoscopic operation was performed or not, operation mode and complications all have a significant impact in LOS prediction. LOS was grouped into three categories (≤ 6, 7–14 and ≥ 15 days). Different variables were also categorised accordingly (see Fig. 3). Data analysis of different variables in relation to LOS indicates that:
-
The number of patients who had a LOS of 15 days and over increased with age. Moreover, the number of patients who had a LOS of 6 days and less decreased with age.
-
Patients with better performance status (ASA grade) have shorter hospital stays compared with patients with poor performance status (ASA grade).
-
Observation of the surgical approach shows that a laparoscopic operation leads to shorter LOS compared with open operation.
-
Patients with an emergency operation were more likely to have an increased LOS compared with an elective operation.
-
Patients with a complication were more likely to have an increased LOS compared with patients without difficulty.
-
Patients’ stays in hospital are shorter when robotic surgery was performed in comparison with non-robotic surgery.
-
It was also observed that BMI is not a good indicator of LOS.
3.3 Readmission prediction
The dataset contains 321 readmission cases. Different algorithms were compared in readmission prediction, and it was found that the model with a BI-LSTM algorithm outperformed the other algorithms (see Table 6). The model with a BI-LSTM algorithm predicted the readmission with an accuracy of 87.5%, a sensitivity of 84.1% and a specificity of 90.9%. ROC curve of the model for predicting readmission is presented in Fig. 4.
3.4 Data analysis of readmission
Data analysis of different variables for readmission shows that the following factors have a strong impact on readmission: ASA grade, operation mode, additional procedure, whether the cancer is curative or not, previous abdominal surgery, types of previous surgery, preoperative M stage and time of the operation. Readmission was grouped into two categories (Readmitted and Not Readmitted). Different variables were also categorised accordingly (see Fig. 5). Data analysis of different variables concerning readmission indicates that:
-
A higher percentage of patients are readmitted in a palliative and uncertain scenario compared with a curative scenario.
-
Patients with an emergency operation are more likely to be readmitted than those with an elective operation.
-
Additional procedures during surgery increase the chance of readmission.
-
The readmission rate following colorectal surgery is low for patients with better performance status (ASA grade) compared with poor performance status (ASA grade).
-
Previous abdominal surgery before colorectal surgery increases the readmission rate.
-
Previous surgery (both bowel and non-chronic groups) prior to colorectal surgery also increases the readmission rate.
-
Patients with a preoperative M stage have a high chance of getting readmitted compared with those without.
-
A longer operation time increases the patients’ likelihood of readmission.
3.5 Mortality prediction
The model with BI-LSTM algorithms predicted the overall mortality with an accuracy of 80%, a sensitivity of 84.3% and specificity of 75.3% (Table 7). The BI-LSTM algorithms outperformed random forest, KNN, SVM and MLP in predicting mortality prediction. The model with a random forest algorithm predicted the 31 days mortality with an accuracy of 98.9%, a sensitivity of 97.9% and a specificity of 100%. The model with BI-LSTM algorithm also performed well in predicting 31 days mortality with an accuracy of 96.6%, a sensitivity of 93.7% and a specificity of 100%. Moreover, the dataset only contains 146 cases of 31 days of mortality.
The model with a BI-LSTM algorithm predicted the 91 days mortality with an accuracy of 94.2%, a sensitivity of 91.4% and a specificity of 96.3% (Table 7). The model with a random forest algorithm performed second-best in predicting 91 days mortality with an accuracy of 87.9%, a sensitivity of 89.4% and a specificity of 86.4%. ROC curve of the model predicting 91 days mortality is presented in Fig. 6. The dataset contains only 257 cases of 91 days of mortality.
3.6 Data analysis of mortality days
Data analysis of mortality days shows that ASA grade, complication, additional procedures, operation mode, whether the cancer is curative or not and preoperative M stage have a strong impact on mortality days. Mortality days were grouped into five categories (≤ 90, 91–180, 181–365, 366–730 and >730 days). Different variables were also categorised accordingly (see Fig. 7). Data analysis of different variables concerning mortality days indicates that:
-
Patients with better performance status (ASA grade) live longer than those with poor performance status (ASA grade) following colorectal surgery.
-
Complications during colorectal surgery lead to shorter lives compared with no complications. Postoperative complications also have a direct impact on patient survival.
-
Patients with a palliative scenario would live for a shorter period compared with a curative and uncertain scenario.
-
Patients with preoperative M stage have a reduced life expectancy compared with those without.
-
Additional procedures during index operation indicate advanced stage of disease and are associated with prolonged operative time and reduced survival.
-
Patients with elective surgery would live longer compared with emergency surgery.
-
It was also observed that BMI is not a good indicator of mortality.
3.7 Comparison between all variables and selected variables
The model with the SVR algorithm mentioned in Sect. 2.3.4 was fed with two different types of the dataset to predict the LOS. In one case, all 27 relevant features related to LOS were used as input. In contrast, in the other case, 11 selected variables mentioned in Sect. 2.3.1 were used as the input in predicting LOS. Comparison between the two scenarios can be seen in Table 8. It shows that the model with all variables as features performs only slightly better than the model with selected variables. The model that used all variables as features predicted the LOS with an accuracy of 83.21 %, MAE of 9.69 and RMSE of 12.52. In contrast, an accuracy of 82.61 %, MAE of 10.32 and RMSE of 12.79 were recorded when the model used selected variables. Comparison of actual versus predicted LOS of both cases can be seen in Fig. 2.
Comparison between all variables and selected variables in predicting readmission can be seen in Table 9. It shows that the model mentioned in Sect. 2.3.4 with the use of all 28 variables predicted the readmission scenario with an accuracy of 87.5%, a sensitivity of 84.1% and specificity of 90.9%. This outperforms the model outcome where nine selected variables were considered. The model with selected variables predicted the readmission scenario with an accuracy of 83.7%, a sensitivity of 76.1% and specificity of 90.9%. ROC curve of each scenario is also presented in Fig. 4.
The model mentioned in Sect. 2.3.4 for mortality prediction predicted the mortality with an accuracy of 80%, a sensitivity of 84.3% and specificity of 75.3% when all 29 variables were considered as features. The model performance decreased when the 10 selected variables were considered. An accuracy of 78.4%, sensitivity of 79% and specificity of 77.8% were recorded (see Table 10 ). The model performance does not vary much for 31 and 91 days mortality when these two different sets of data were used for comparison (see Tables 11 and 12). ROC curve of each scenario predicting 91 days mortality is also presented in Fig. 6.
4 Discussion
4.1 Role of data analytics and machine learning
LOS, readmission and mortality are widely used proxies and quality indicators of care and healthcare spending following colorectal surgery. Accurate prediction of these three proxies remains a crucial challenge after colorectal cancer surgery and would lead to substantial resource implications for clinical and management teams. Consequently, this study has aimed to predict LOS, readmission, and mortality with various machine-learning algorithms. Moreover, different predictor variables were explored and investigated through data analysis and machine learning techniques. A single centre’s data were used for the experiments. The dataset was then processed according to the models’ requirements. Data analysis and feature analysis of the dataset were also performed.
Prior research has used a limited number of variables to investigate LOS, mortality and readmission following CRC surgery [7, 8, 10, 19, 39,40,41,42]. This study includes a larger number of variables (47) including demographics, peri- and post-operative outcomes, surgical approaches, complications and mortality. These variables are used and compared in predicting LOS, mortality and readmission. Sets of key variables were identified using various data analysis techniques. Algorithms like Extra Tree Regressor, LASOO and correlation matrix with heat map help to extract essential features from all variables. Comparison between using all variables and selected variables has shown that the machine learning model performs better with all variables than the selected variables in predicting LOS, readmission and mortality prediction. This observation confirms the benefit of applying machine learning algorithms when high-dimensional data are available. In principle, a ML algorithm ought to be able to make use of all available information, giving a lower weighting to the less useful information.
Medical researchers have found ML algorithms helpful to predict the diagnosis and prognosis of various diseases and health conditions accurately [20, 21, 43, 44]. Moreover, a prediction model with a machine learning algorithm was used to detect early colorectal cancer and recurrence of stage IV colorectal cancer after tumour resection [23, 45]. However, prior studies related to patient LOS, readmission and mortality after colorectal surgery have been mainly limited to observation study of predictor variables, with little investigation of machine learning. The current study explores not only the significant predictor variables, but also investigates machine learning algorithms that can exploit them.
4.2 LOS prediction
Pucciarelli et al., on their observational analysis, found median LOS is 13 days [7] in comparison to 11.26 days in this study. Kelly et al., in their research, found median LOS is 14 days for elective and 21 for emergency admissions [19]. Aravani et al. found that age, comorbidity, socioeconomic deprivation, stage of the disease, and emergency operations are better predictor variables of longer LOS [8]. Age, comorbidities, marital status and emergency readmission were linked to the likelihood of longer LOS [19]. Ahmed et al. showed that ASA grade, epidurals and oral opiates are associated with an earlier discharge [41]. Chiu et al. state that minor and major complications were better predictors of LOS than preoperative demographic and disease parameters [46]. Sex, congestive heart failure, weight loss, Crohn’s disease, preoperative albumin < 3.5 g/dL and hematocrit < 47%, baseline sepsis, ASA class ≥ 3, open surgery, surgical time ≥ 190 min, post-operative pneumonia, failure to wean from mechanical ventilation, deep venous thrombosis, urinary tract infection, systemic sepsis, surgical site infection and reoperation within 30-days from the primary surgery were the risk factors for prolonged LOS [47]. In contrast, this study found age, ASA grade, operative time, presence or absence of a stoma, robotic or laparoscopic approach to surgery, and complications are the significant predictor variables of LOS.
This study investigated the predictor variables’ scope in predicting LOS by building a prediction model with machine learning techniques, unlike the prior work. For LOS prediction, models were built using different machine learning algorithms, and the results were compared to find the best model. The model with SVR algorithms turns out to be the best and tuned further with a tuning algorithm. Finally, the adjusted model is used for LOS prediction, and the model predicted the LOS with an MAE and RMSE of 9.69 and 12.52, respectively. The LOS prediction regression outcome is then converted to a binary class with a few conditions mentioned in Sect. 3.1, which showed the model could predict LOS with 83.21% accuracy. Data analysis of different predictor variables of LOS shows that Age groups, ASA grade, Surgical approach, Operation mode, Complication and Robotic surgery are the most important predictor variables for LOS Prediction. On the other hand, the data analysis also shows that BMI is not a good predictor for LOS prediction.
4.3 Readmission prediction
A national population-based study by Pucciarelli et al. found that gender, hospital location, comorbidities, type of surgery, stoma creation, open approach, rectal tumour location, and longer LOS were the predictor variables of 30-day readmission [7]. Chung et al. found that surgical site infection, hepatic disease, pulmonary disease, TNM stage, and operation time were the significant risk factors for readmission [48]. In contrast, this study found that Age, Surgical approach, Stoma formation, Preoperative nodal stage, Operation time, Operation mode, Previous surgery type, LOS, and the Specific procedure were the significant predictor variables for readmission.
A recent study by Rubens et al. created a risk model for predicting 30-day readmission rates after surgical treatment for colon cancer. Their model showed 60.2% accuracy, 58.8% sensitivity and 60.4% specificity with a limited number of variables [49]. In contrast, this study showed that the BI-LSTM algorithm predicts the readmission with an accuracy, sensitivity and specificity of 87.5%, 84.1% and 90.9%, respectively. Data analysis of the dataset found a 30-day readmission rate of 7.4%, which is comparable with the two recent reviews and meta-analyses findings ranged between 7 and 25% [50, 51].
4.4 Post-operative mortality prediction
The risk of post-operative mortality after CRC has been investigated through several scoring systems [52,53,54,55,56,57]. To date, none of these scoring systems has been found effective as a predictor, and researchers have raised questions over their accuracy and usability [40, 58,59,60]. Moreover, these scoring systems require a high level of preoperative information, including laboratory values which may not always be available and thus they have not been employed widely [61]. Murray et al. found that age, ASA ≥ 3, renal failure, ascites, heart failure, disseminated cancer, hypoalbuminemia, open surgery, non-independent status and admission from a chronic care facility are the risk factors of 30-day mortality [40]. Wilkins et al. found that age, ASA grade IV–V, Dukes’ stage D, and urgent surgery are strongly associated with post-operative mortality. Their model predicted mortality with an area under the curve of 0.88 [39].
In contrast, this study considered post-operative mortality as a simple binary classification scenario. Prediction models were built and compared with machine-learning algorithms to predict mortality, 31 and 91 days mortality. Moreover, well-known evaluation metrics (i.e., accuracy, sensitivity and specificity) were used to evaluate the model performance. Machine learning algorithms were also compared in predicting mortality, 31 days mortality and 91 days mortality. The BI-LSTM algorithm model predicted the mortality and 91 days mortality with an accuracy of 80% and 94.2%, respectively and outperformed other algorithms. The random forest algorithm model outperformed other algorithms with an accuracy of 98.9% in predicting 31 days of mortality. The BI-LSTM model predicted the 31 days mortality with an accuracy of 96.6%.
Data analysis on mortality days shows that ASA grade, Complication, Curative Surgery, Additional procedures, Operation mode and Preoperative M stage are the variables that could be used to classify different groups of mortality days. BMI is not a good indicator in categorising different groups of mortality days. Figure 7 appears to show that a higher BMI leads to higher mortality days. However, this pattern is not correct as BMI groups 31–35 and ≤ 35 only represent around 5% of the whole dataset. Moreover, data analysis of the dataset shows a 30 days mortality of 3.39% for CRC surgery, which is comparable with the results of other similar studies ranged between 0.9 and 9.9% [7, 62, 63].
4.5 Limitations
A limitation of this work has been the number of data samples, which were derived from a single centre. We are planning to collaborate with other colorectal groups to pool our data. Future work will therefore include a larger dataset comprising more samples and additional variables or features. Such additional variables may include nutritional status, obesity and metabolic syndrome, pre-operative heamoglobin and hypoalbuminemia, and comorbidities. Complications could be broken down into intra- and post-operative, along with a descriptor or Clavien-Dindo grade. It may also be informative to investigate the data patterns in colon and rectal cancers separately.
Nevertheless, some clear patterns have emerged from the data in the current study.
5 Conclusions
Data analytics and AI have been shown to be accurate tools for predicting length of stay, readmission, and mortality following colorectal cancer surgery. Such predictive capability is important for designing the best patient care and prioritising resources.
These three proxies for patient outcomes were found to share the following common significant predictors: Age, Stoma, and Operation mode. LOS was also found to be a significant predictor for the other two patient outcomes, i.e., readmission and mortality. Other predictors had greater or lesser significance for each patient outcome. Nevertheless, the prediction algorithms were most effective when using the full data set rather than just the main predictors.
Bidirectional long short-term memory (BI-LSTM) was found to be the best prediction algorithm overall. In each case, we have demonstrated accuracies of greater than 80% and sensitivities and specificities of at least 84% and 75% respectively. The best results were achieved for 31 days mortality, with 96% accuracy, 93% sensitivity and 100% specificity. With improving techniques, richer data sets, and overlaid clinical expertise, further improvements can be anticipated, leading to improved patient outcomes and more efficient healthcare services.
Data availability
Enquiries about data access can be made via JK.
Code availability
The computer code has been deposited in the github repository [25].
References
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424. https://doi.org/10.3322/caac.21492.
Siegel RL, Miller KD, Goding Sauer A, Fedewa SA, Butterly LF, Anderson JC, Cercek A, Smith RA, Jemal A. Colorectal cancer statistics, 2020. CA Cancer J Clin. 2020. https://doi.org/10.3322/caac.21601.
Cancer Research UK. Bowel cancer statistics; 2018.
Gervès-Pinquié C, Girault A, Phillips S, Raskin S, Pratt-Chapman M. Economic evaluation of patient navigation programs in colorectal cancer care, a systematic review. Health Econ Rev. 2018;8(1):12. https://doi.org/10.1186/s13561-018-0196-4.
De Simone V, Ronchetti G, Franzè E, Colantoni A, Ortenzi A, Fantini M, Rizzo A, Sica G, Sileri P, Rossi P, et al. Interleukin-21 sustains inflammatory signals that contribute to sporadic colon tumorigenesis. Oncotarget. 2015;6:9908.
Candi E, Tesauro M, Cardillo C, Lena A, Schinzari F, Rodia G, Sica G, Gentileschi P, Rovella V, Annicchiarico-Petruzzelli M, et al. Metabolic profiling of visceral adipose tissue from obese subjects with or without metabolic syndrome. Biochem J. 2018;475:1019–35.
Pucciarelli S, Zorzi M, Gennaro N, Gagliardi G, Restivo A, Saugo M, Barina A, Rugge M, Zuin M, Maretto I, et al. In-hospital mortality, 30-day readmission, and length of hospital stay after surgery for primary colorectal cancer: a national population-based study. Eur J Surg Oncol. 2017;43(7):1312–23. https://doi.org/10.1016/j.ejso.2017.03.003.
Aravani A, Samy EF, Thomas JD, Quirke P, Morris EJA, Finan PJ. A retrospective observational study of length of stay in hospital after colorectal cancer surgery in England (1998–2010). Medicine. 2016. https://doi.org/10.1097/MD.0000000000005064.
Thomas JW, Holloway JJ. Investigating early readmission as an indicator for quality of care studies. Med Care. 1991. https://doi.org/10.1097/00005650-199104000-00006.
Schneider EB, Hyder O, Brooke BS, Efron J, Cameron JL, Edil BH, Schulick RD, Choti MA, Wolfgang CL, Pawlik TM. Patient readmission and mortality after colorectal surgery for colon cancer: impact of length of stay relative to other clinical factors. J Am College Surg. 2012;214(4):390–8. https://doi.org/10.1016/j.jamcollsurg.2011.12.025.
Hansen LO, Young RS, Hinami K, Leung A, Williams MV. Interventions to reduce 30-day rehospitalization: a systematic review. Ann Internal Med. 2011;155(8):520–8. https://doi.org/10.7326/0003-4819-155-8-201110180-00008.
Jencks SF. Defragmenting care. Ann Internal Med. 2010;153(11):757–8. https://doi.org/10.7326/0003-4819-153-11-201012070-00010.
Alves A, Panis Y, Mathieu P, Mantion G, Kwiatkowski F, Slim K. Postoperative mortality and morbidity in French patients undergoing colorectal surgery: results of a prospective multicenter study. Arch Surg. 2005;140(3):278–83. https://doi.org/10.1001/archsurg.140.3.278.
Tevis SE, Carchman EH, Foley EF, Harms BA, Heise CP, Kennedy GD. Postoperative ileus-more than just prolonged length of stay? J Gastrointest Surg. 2015;19(9):1684–90. https://doi.org/10.1007/s11605-015-2877-1.
Henneman D, Van Leersum NJ, Ten Berge M, Snijders HS, Fiocco M, Wiggers T, Tollenaar RAEM, Wouters MWJM. Failure-to-rescue after colorectal cancer surgery and the association with three structural hospital factors. Ann Surg Oncol. 2013;20(11):3370–6. https://doi.org/10.1245/s10434-013-3037-z.
National Cancer Intelligence Network. 30-Day post-operative mortality after colorectal cancer surgery in England; 2011.
National Bowel Cancer Audit. Annual report 2019; 2020.
Mariotto AB, Robin Yabroff K, Shao Y, Feuer EJ, Brown ML. Projections of the cost of cancer care in the united states: 2010–2020. J Natl Cancer Inst. 2011;103(2):117–28. https://doi.org/10.1093/jnci/djq495.
Kelly M, Sharp L, Dwane F, Kelleher T, Comber H. Factors predicting hospital length-of-stay and readmission after colorectal resection: a population-based study of elective and emergency admissions. BMC Health Serv Res. 2012;12(1):77. https://doi.org/10.1186/1472-6963-12-77.
Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J. 2015;13:8–17. https://doi.org/10.1016/j.csbj.2014.11.005.
Pan L, Liu G, Lin F, Zhong S, Xia H, Sun X, Liang H. Machine learning applications for prediction of relapse in childhood acute lymphoblastic leukemia. Sci Rep. 2017;7(1):1–9. https://doi.org/10.1038/s41598-017-07408-0.
Passos IC, Mwangi B, Kapczinski F. Big data analytics and machine learning: 2015 and beyond. Lancet Psychiatry. 2016;3(1):13–5. https://doi.org/10.1016/s2215-0366(15)00549-0.
Hornbrook MC, Goshen R, Choman E, O’Keeffe-Rosetti M, Kinar Y, Liles EG, Rust KC. Early colorectal cancer detected by machine learning model using gender, age, and complete blood count data. Dig Dis Sci. 2017;62(10):2719–27. https://doi.org/10.1007/s10620-017-4722-8.
Bauer M, Fitzgerald L, Haesler E, Manfrin M. Hospital discharge planning for frail older people and their family. Are we delivering best practice? A review of the evidence. J Clin Nurs. 2009;18(18):2539–46. https://doi.org/10.1111/j.1365-2702.2008.02685.x.
Masum S. Eventpredictions. https://github.com/shamsulmasum/Colorectal_Feature_Analysis. 2021.
McKinney W, et al. Data structures for statistical computing in python. In: Proceedings of the 9th python in science conference, vol. 445. Austin; 2010. p. 51–56. https://doi.org/10.25080/Majora-92bf1922-00a.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30. https://doi.org/10.5555/1953048.2078195.
Yeo IK, Johnson RA. A new family of power transformations to improve normality or symmetry. Biometrika. 2000;87(4):954–9. https://doi.org/10.1093/biomet/87.4.954.
Kuhn M, Johnson K. Feature engineering and selection: a practical approach for predictive models. Boca Raton: CRC Press; 2019. https://doi.org/10.1201/9781315108230.
Refaeilzadeh P, Tang L, Liu H. Cross-validation. Encycl Database Syst. 2009;5:532–8. https://doi.org/10.1007/978-0-387-39940-9_565.
Müller AC, Guido S, et al. Introduction to machine learning with Python: a guide for data scientists. Sebastopol: O’Reilly Media, Inc.; 2016.
Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res. 2012;13(1):281–305. https://doi.org/10.5555/2188385.2188395.
Masum S, Liu Y, Chiverton J. Multi-step time series forecasting of electric load using machine learning models. In: International conference on artificial intelligence and soft computing. New York: Springer; 2018. p. 148–59. https://doi.org/10.1007/978-3-319-91253-0_15.
Masum S, Chiverton JP, Liu Y, Vuksanovic B. Investigation of machine learning techniques in forecasting of blood pressure time series data. In: International conference on innovative techniques and applications of artificial intelligence. New York: Springer; 2019. p. 269–82. https://doi.org/10.1007/978-3-030-34885-4_21.
Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint. arXiv:1412.6980. 2014.
Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;3(Mar):1157–82. https://doi.org/10.1162/153244303322753616.
Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006;63(1):3–42. https://doi.org/10.1007/s10994-006-6226-1.
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological). 1996;58(1):267–88. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.
Wilkins S, Oliva K, Chowdhury E, Ruggiero B, Bennett A, Andrews EJ, Dent O, Chapuis P, Platell C, Reid CM, et al. Australasian ACPGBI risk prediction model for 30-day mortality after colorectal cancer surgery. BJS Open. 2020;4(6):1208. https://doi.org/10.1002/bjs5.50356.
Murray AC, Mauro C, Rein J, Kiran RP. 30-day mortality after elective colorectal surgery can reasonably be predicted. Tech Coloproctol. 2016;20(8):567–76. https://doi.org/10.1007/s10151-016-1503-x.
Ahmed J, Lim M, Khan S, McNaught C, MacFie J. Predictors of length of stay in patients having elective colorectal surgery within an enhanced recovery protocol. Int J Surg. 2010;8(8):628–32. https://doi.org/10.1016/j.ijsu.2010.07.294.
Vicendese D, Marvelde LT, McNair PD, Whitfield K, English DR, Taieb SB, Hyndman RJ, Thomas R. Hospital characteristics, rather than surgical volume, predict length of stay following colorectal cancer surgery. Aust N Z J Public Health. 2020;44(1):73–82. https://doi.org/10.1111/1753-6405.12932.
Esteban C, Arostegui I, Moraza J, Aburto M, Quintana JM, Pérez-Izquierdo J, Aizpiri S, Capelastegui A. Development of a decision tree to assess the severity and prognosis of stable COPD. Eur Respir J. 2011;38(6):1294–300. https://doi.org/10.1183/09031936.00189010.
Barakat N, Bradley AP, Barakat MNH. Intelligible support vector machines for diagnosis of diabetes mellitus. IEEE Trans Inf Technol Biomed. 2010;14(4):1114–20. https://doi.org/10.1109/titb.2009.2039485.
Xu Y, Ju L, Tong J, Zhou CM, Yang JJ. Machine learning algorithms for predicting the recurrence of stage IV colorectal cancer after tumor resection. Sci Rep. 2020;10(1):1–9. https://doi.org/10.1038/s41598-020-59115-y.
Chiu HC, Lin YC, Hsieh HM, Chen HP, Wang HL, Wang JY. The impact of complications on prolonged length of hospital stay after resection in colorectal cancer: a retrospective study of Taiwanese patients. J Int Med Res. 2017;45(2):691–705. https://doi.org/10.1177/0300060516684087.
Lobato LFC, Ferreira PCA, Wick EC, Kiran RP, Remzi FH, Kalady MF, Vogel JD. Risk factors for prolonged length of stay after colorectal surgery. J Coloproctol. 2013;33(1):22–7. https://doi.org/10.1097/00000658-199908000-00016.
Chung JS, Kwak HD, Ju JK. Thirty-day readmission after elective colorectal surgery for colon cancer: a single-center cohort study. Ann Coloproctol. 2020;36(3):186. https://doi.org/10.3393/ac.2019.11.04.
Rubens M, Ramamoorthy V, Saxena A, Bhatt C, Das S, Veledar E, McGranaghan P, Viamonte-Ros A, Odia Y, Chuong M, et al. A risk model for prediction of 30-day readmission rates after surgical treatment for colon cancer. Int J Colorectal Dis. 2020;35:1529–35. https://doi.org/10.1007/s00384-020-03605-y.
Damle RN, Alavi K. Risk factors for 30-d readmission after colorectal surgery: a systematic review. J Surg Res. 2016;200(1):200–7. https://doi.org/10.1016/j.jss.2015.06.052.
Li LT, Mills WL, White DL, Li A, Gutierrez AM, Berger DH, Naik AD. Causes and prevalence of unplanned readmissions after colorectal surgery: a systematic review and meta-analysis. J Am Geriatr Soc. 2013;61(7):1175–81. https://doi.org/10.1111/jgs.12307.
Tekkis PP, Poloniecki JD, Thompson MR, Stamatakis JD. Operative mortality in colorectal cancer: prospective national study. Bmj. 2003;327(7425):1196–201. https://doi.org/10.1136/bmj.327.7425.1196.
Copeland GP, Jones D, Walters M. POSSUM: a scoring system for surgical audit. Br J Surg. 1991;78(3):355–60. https://doi.org/10.1002/bjs.1800780327.
Tekkis PP, Prytherch DR, Kocher HM, Senapati A, Poloniecki JD, Stamatakis JD, Windsor ACJ. Development of a dedicated risk-adjustment scoring system for colorectal surgery (colorectal POSSUM). Br J Surg. 2004;91(9):1174–82. https://doi.org/10.1002/bjs.4430.
Prytherch DR, Whiteley MS, Higgins B, Weaver PC, Prout WG, Powell SJ. POSSUM and Portsmouth POSSUM for predicting mortality. Br J Surg. 1998;85(9):1217–20. https://doi.org/10.1046/j.1365-2168.1998.00840.x.
Fazio VW, Tekkis PP, Remzi F, Lavery IC. Assessment of operative risk in colorectal cancer surgery: the Cleveland clinic foundation colorectal cancer model. Dis Colon Rectum. 2004;47(12):2015–24. https://doi.org/10.1007/s10350-004-0704-y.
Alves A, Panis Y, Mathieu P, Kwiatkowski F, Slim K, Mantion G. Mortality and morbidity after surgery of mid and low rectal cancer: results of a French prospective multicentric study. Gastroenterol Clin Biol. 2005;29(5):509–14. https://doi.org/10.1016/s0399-8320(05)82121-9.
Ferjani AM, Griffin D, Stallard N, Wong LS. A newly devised scoring system for prediction of mortality in patients with colorectal cancer: a prospective study. Lancet Oncol. 2007;8(4):317–22. https://doi.org/10.1016/s1470-2045(07)70045-1.
Senagore AJ, Warmuth AJ, Delaney CP, Tekkis PP, Fazio VW. POSSUM, p-POSSUM, and cr-POSSUM: implementation issues in a united states health care system for prediction of outcome for colon cancer resection. Dis Colon Rectum. 2004;47(9):1435–41. https://doi.org/10.1007/s10350-004-0604-1.
Van Der Sluis FJ, Espin E, Vallribera F, de Bock GH, Hoekstra HJ, Van Leeuwen BL, Engel AF. Predicting postoperative mortality after colorectal surgery: a novel clinical model. Colorectal Dis. 2014;16(8):631–9. https://doi.org/10.1111/codi.12580.
Williams DJ, Walker JD. A nomogram to calculate the physiological and operative severity score for the enumeration of mortality and morbidity (POSSUM). Br J Surg. 2014;101(3):239–45. https://doi.org/10.1002/bjs.9363.
van Eeghen EE, den Boer FC, Loffeld RJLF. Thirty days post-operative mortality after surgery for colorectal cancer: a descriptive study. J Gastrointest Oncol. 2015;6(6):613. https://doi.org/10.3978/j.issn.2078-6891.2015.079.
Henneman D, van Bommel ACM, Snijders A, Snijders HS, Tollenaar RAEM, Wouters MWJM, Fiocco M. Ranking and rankability of hospital postoperative mortality rates in colorectal cancer surgery. Ann Surg. 2014;259(5):844–9. https://doi.org/10.1097/sla.0000000000000561.
Funding
This work has been supported by the University of Portsmouth’s Thematic Research & Innovation Fund (TRIF).
Author information
Authors and Affiliations
Contributions
AH and JK conceived the project idea. SM conducted the data analytics and machine learning under the supervision of AH. JK and his team (SS and KF) provided the data and clinical interpretation. SM wrote the original manuscript and provided all the figures. AH, SM, and JK reviewed and revised the article. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Masum, S., Hopgood, A., Stefan, S. et al. Data analytics and artificial intelligence in predicting length of stay, readmission, and mortality: a population-based study of surgical management of colorectal cancer. Discov Onc 13, 11 (2022). https://doi.org/10.1007/s12672-022-00472-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12672-022-00472-7