Introduction 

Artificial intelligence (AI) is a new field in medicine gaining major interest within healthcare, but its development in clinical settings is already referred to as a digital revolution for healthcare [1].

Artificial intelligence is defined as computer science capable of imitating several aspects of human intelligence and behavior [2]. With the use of large datasets, AI models can be trained to conduct several complicated tasks [3]. Machine learning, one of the domains of AI, is a computer system in which models are trained to form new predictions or decisions by analyzing large quantities of data [4]. A specific subclass of machine learning known as deep learning uses multiple layers to analyze imported data. In each layer, weights are calculated for several factors from the data. After repeating this process, a final model is trained and ready to be applied on new data. Examples of both machine and deep learning techniques are presented in Table 1.

Table 1 Definitions of subclasses within AI

Several potentials of AI models have already been demonstrated in clinical practice [14, 15]. For example, machine learning algorithms have been applied to MRI, X-ray, and CT images to detect tumors in various organs. Additionally, input from large numbers of electronic health records enabled AI models to identify risk factors for multifactorial outcomes such as length of stay, mortality, and early hospital readmission after surgery [16]. Recently, in colorectal surgery, machine learning was used to predict outcomes such as lymph node metastasis, response to chemoradiotherapy, and postoperative complications. For these outcomes, predictions were performed with accuracies up to 96%. This could emphasize the potential of machine learning to support risk stratification and facilitate clinical decision-making for general surgeons [17,18,19].

Currently, bariatric surgery has evolved to being a key in treating the worldwide pandemic of morbid obesity. Optimal postoperative weight loss including resolution of obesity-related comorbidities leads to a decreased burden of disease and related mortality [20, 21]. Despite an increasing amount of large data set studies in bariatric surgery, several factors such as short- and long-term complication rates and weight loss remain unpredictable. An example in which AI could benefit bariatric surgery is insufficient weight loss after surgery. Ten to thirty percent of patients show insufficient weight loss after bariatric surgery [22]. Risk factors for this are extremely diverse varying from socio-economic factors such as insurance policy to a specific type of microbiome [23, 24]. A complete overview of all risk factors and ideally an algorithm to calculate the risk of insufficient weight loss for each patient separately is still missing. Assembling an algorithm to identify both patients at major risk of insufficient weight loss and high risk of postoperative complications would assist the bariatric surgeon as well as the patient to reach a well-informed decision.

Despite the potential benefits of AI, the scope of machine learning applications is rarely reported. Therefore, this systematic review aims to provide an extensive overview of (potential) machine learning applications within bariatric surgery.

Materials and Methods

Search Strategy

A systematic search was performed in accordance with the Cochrane Handbook for Systematic Reviews of Interventions version 6.0 and PRISMA guidelines. To identify all relevant publications, systematic searches were conducted in the bibliographic databases PubMed, Embase.com, Clarivate Analytics/Web of Science Core Collection, and the Wiley/Cochrane Library from inception up to the 7th of July 2021. The search included keywords and free text terms for (synonyms of) ‘machine learning’ combined with (synonyms of) ‘digestive system surgical procedures’ and ‘bariatric surgery’. The full search strategy can be found in the Supplementary information (see Appendix).

Selection Process

Two reviewers (MB and JCP) conducted the title and abstract screening independently in accordance with the inclusion and exclusion criteria. Studies were only selected for full-text assessment if both reviewers agreed on inclusion. Controversies between reviewers were resolved by discussions, resulting in consensus. Studies were included if they met the following criteria: (i) describing machine learning algorithms within bariatric surgery, (ii) clinical study, (iii) including adults. Studies were excluded if they (i) did not describe bariatric surgery specifically, (ii) were not written in English, (iii) were certain publication types: reviews, editorials, letters, legal cases, or interviews.

Risk of Bias Evaluation

The ROBINS-I assessment tool was applied by two reviewers (MB and JCP) to evaluate the methodological quality of included non-randomized studies [25]. Additionally, the PROBAST tool was used by two reviewers (MB and JCP) to assess the quality of machine learning models [26]. Conflicts between reviewers were solved by discussions.

Data Synthesis and Outcome Assessment

Following full-text screening, the following data were extracted from the included studies; first author, year of publication, country, number of patients included, mean age of the study population, percentage of female patients, study design, follow-up time, surgical procedure, type of machine learning, external validation, purpose of machine learning, outcome measurements, and prediction performance. The categorization of studies was based on machine learning purposes and results were demonstrated separately.

Results

Study Selection and Characteristics

The systematic literature search generated a total of 1821 references after removal of duplicates. After screening of titles and abstracts, 21 studies remained for full-text assessment. Eleven full texts were included. The flow chart of the search and selection process is presented in Fig. 1. Table 2 summarizes the general characteristics of the included studies.

Fig. 1
figure 1

PRISMA flow diagram of the search

Table 2 General characteristics of included studies

Risk of Bias Evaluation

As all included studies were either retrospective (n = 10) or prospective (n = 1) cohort studies, the ROBINS-I assessment tool was used for quality assessment of all included studies (Fig. 2a). Since the primary outcome of this study was the type of machine learning techniques being used, domains such as bias due to confounding and bias in outcome measurements obtained low risk of bias scores. However, due to the retrospective design of these studies, a moderate risk of bias was found in the intervention classification domain. Furthermore, results of the Probast score per domain are demonstrated in Fig. 2b.

Fig. 2
figure 2

a Methodological quality assessment of the non-randomized studies, according to ROBINS-I assessment tool. b Quality of machine learning models according to the Probast tool

Categorization of Machine Learning Techniques

Purposes of machine learning were prediction of postoperative complications (n = 5), prediction of the amount of postoperative weight loss (n = 3), aid in decision-making preoperatively (n = 1), predicting presence of hiatal hernias (n = 1), and prediction of quality of life (n = 1). The frequency at which each form of machine learning technique was used in the included studies is summarized in Fig 3.

Fig. 3
figure 3

Applied forms of machine learning

Postoperative Complications

Five studies demonstrated the use of machine learning algorithms to predict postoperative complications.

Sheikhtaheri et al. developed a model to predict postoperative complications within 90 days after one anastomosis gastric bypass surgery (OAGB), by using an ANN algorithm [27]. These complications included bleeding, anastomotic leakage, obstruction, intraabdominal abscess, and pulmonary embolism. Thirty-two factors ranging from age and BMI to smoking and laboratory test results were considered important in this prediction model. For the postoperative period of 10 days, the highest accuracy of the model was obtained; an AUC of 0.98 was observed.

Cao et al. (2019) applied multiple machine learning algorithms to detect severe complications within 30 days after bariatric surgery [28]. Machine learning techniques included decision tree, random forest, gradient boosting, SVM, and ANN models. Results have revealed the following performances for the models (accuracy, AUC): decision tree (92%, 0.5), random forest (95%, 0.51), gradient boosting (96%, 0.58), SVM (96%, 0.5), and ANN (96%, 0.54).

Consequently, Cao et al. (2020) applied ANN, and CNN models to predict serious complications within 30 days after bariatric surgery. Serious complications were defined as Clavien–Dindo classification grade 3b and higher (i.e., anastomotic leakage, organ failure, or death) [29]. For each model, the predictive performance was described by means of the accuracy, and AUC. The ANN model showed an accuracy of 84%, and an AUC of 0.54. For the CNN model, the accuracy was 95% and the AUC appeared to be 0.57 for predicting postoperative complications.

The authors of the 4th study used ANN and GBM models to predict gastrointestinal leak and venous thromboembolism in patients undergoing a laparoscopic gastric bypass or laparoscopic sleeve gastrectomy [30]. For gastrointestinal leakage, the ANN and GBM model showed the following predictive capabilities, respectively; an AUC of 0.75 and 0.70. In predicting venous thromboembolisms, the ANN algorithm and gradient boosting model achieved the following values, respectively; an AUC of 0.65 and 0.67. Out of 37 variables, the most important factors in predicting both gastrointestinal leakage and venous thromboembolisms were age, height, and weight-related measures, hematocrit, albumin, and assistant training level. A history of deep vein thrombosis was an additional important variable for prediction of venous thromboembolisms.

Wise et al. (2019) aimed to predict the readmission rate of 3.1%, the reoperation and reintervention rate of 8.7%, and the mortality rate of 0.07% within 30 days after laparoscopic sleeve gastrectomy in a large cohort [31]. For this ANN model, an AUC of 0.59 was detected. Moreover, the following seven factors appeared to be important for the prediction of 30-day morbidity and mortality: age, race, BMI, hypertension, diabetes mellitus, functional status, and previous surgery.

Weight Loss

All three studies aimed to predict postoperative weight loss by applying ANN models.

Piaggi et al. aimed to predict the percentage excess weight loss (%EWL) in women with severe obesity, 2 years after the laparoscopic adjustable gastric banding procedure [32]. %EWL at 2 years postoperatively was 48.2%. The ANN model developed was based on preoperative data including the comprehensive test of psychopathology Minnesota Multiphasic Personality Inventory-2. The model showed an AUC of 0.80 for this prediction. Age, paranoia, antisocial practices, and Type A behavior were independent predictors of %EWL.

Wise et al. (2016) used an ANN model to predict the percentage excess body mass index loss (%EBMIL) 180 and 360 days after laparoscopic Roux-en-Y gastric bypass surgery based on preoperative variables such as BMI, race, and gender [33]. The %EBMIL was 73.5% 1 year, postoperatively. The AUC for this model was observed to be 0.83. The variables gender, race, BMI, and diabetes mellitus appeared to be the key factors for postoperative weight loss.

Lastly, Lee et al. used 17 preoperative factors to predict successful %EWL 2 years after laparoscopic OAGB or gastric banding [34]. Success in %EWL was defined as %EWL > 50% which was accomplished by 84% of the patients. The ANN model showed an accuracy of 94% and the type of operation, HbA1c, and triglyceride levels appeared to be essential for predicting successful %EWL at 2 years postoperatively.

Decision-Making

Aminian et al. developed a prediction model using an RF algorithm to estimate the risk of long-term end-organ complications in patients with type 2 diabetes and obesity when considering bariatric surgery [35]. The discriminating ability at 10 years was measured in the area under the curve (AUC) and resulted in the following for the surgical and non-surgical groups, respectively; all-cause mortality 0.79 and 0.81, coronary artery events 0.66 and 0.67, heart failure 0.73 and 0.75, and nephropathy 0.73 and 0.76. The five most important variables in the prediction models of all-cause mortality were age, BMI at enrollment, history of heart failure, insulin use, and smoking status.

Diagnosis

Assaf et al. developed a decision tree model for the preoperative prediction of the presence of hiatal hernias (HH) in patients undergoing a laparoscopic sleeve gastrectomy procedure [36]. This is relevant as the presence of a hiatal hernia may impose per-operative technical challenges which is why foreknowledge is beneficial. The model showed an accuracy of 88.2% for the prediction of hiatal hernias. Thirteen variables were observed to be influencing the prediction of hiatal hernias, in which reflux symptoms, higher age, and BMI were discovered to be associated with a higher risk of hiatal hernias. Additionally, lower age and BMI have been discovered to be related to shorter operation lengths.

Postoperative Quality of Life

Cao et al. (2019) built a CNN model to predict the postoperative health-related quality of life 1, 2, and 5 years after a primary gastric bypass procedure [37]. The postoperative quality of life was measured by the RAND-SF-36 questionnaire and the obesity-related problems scale (OP). Performance of the machine learning algorithm was presented as the mean squared error, indicating the discrepancy between the observed value and predicted value. The mean squared error for the CNN model was 0.035 in predicting the postoperative quality of life.

Discussion

From this systematic review, it can be concluded that artificial intelligence has potentials in several fields within bariatric surgery. Various models have been created to predict severe complications with AUCs up to 0.98. Secondly, weight loss was predicted by AUCs ranging from 0.80 to 0.83. Lastly, an AUC up to 0.81 was observed in predicting the postoperative quality of life, diagnosis, and end-organ complications of patients with morbid obesity.

Five studies have applied machine learning models to predict postoperative complications for patients undergoing bariatric surgery. Among several models, neural networks have shown the highest accuracy of 98% in predicting postoperative complications. Ideally, by using machine learning models, bariatric surgeons will be able to better predict (severe) postoperative complications for each unique patient. These predictions can, in theory, influence the decision towards a different type of bariatric operation or different timing of the operation, more specific prophylactic measures to prevent a certain type of complication, or a shared decision with complete informed consent.

In a recent study, the “low-risk bariatric patient” was defined by the absence of factors such as a medical history of thromboembolic events, diabetes mellitus, and kidney or pulmonary disease [38]. In this review, overlapping risk factors have been identified in the included studies predicting postoperative complications and weight loss (Table 3). It is of no surprise that age, BMI, previous intra-abdominal surgery, diabetes, and cardiovascular disease were identified as risk factors for postoperative severe complications. However, other factors such as race, inflammatory bowel disease, laboratory results, and functional status are more controversial. Not all clinical variables were included in a similar or homogeneous manner across the included studies. This is despite the hypothesis that inclusion of previously excluded variables may improve the accuracy of machine learning models to predict postoperative complications and related risk factors. In the field of breast cancer surgery, the exclusion of variables in machine learning models was prevented by determining many variables based on pre-operative, intra-operative, and post-operative means [39]. These findings could suggest that guidelines are needed to secure a comprehensive list of clinical factors that can be used for an optimal training process of machine learning models.

Table 3 Summary of overlapping factors for postoperative complications and weight loss

Three studies have attempted to predict postoperative weight loss. Neural networks demonstrated the highest AUC of 0.94 in predicting postoperative weight loss. For decades now, researchers in the bariatric field have attempted to identify all risk factors for insufficient weight loss after bariatric surgery. Multiple studies have shown that postoperative weight loss is dependent on multiple factors, both objective measures such as BMI and subjective measures such as patient-related measures. It could therefore be specifically beneficial and interesting for bariatric surgeons to implement AI as a means of identifying risk factors for, for example, insufficient WL. However, as Nudel et al. noted [30], external validation of the machine learning model was missing due to insufficient data. Therefore, more large datasets are needed before accurate and valid models can be developed.

For predicting the risk of long-term end-organ complications, such as coronary artery events, heart failure, and nephropathy in patients suffering from type 2 diabetes and morbid obesity, a random forest model showed an AUC of 0.66, 0.73, and 0.73, respectively. According to Aminian et al. [35], this random forest model may support and accelerate the process of decision-making toward bariatric surgery. This is desirable as the duration of obesity itself and the presence of its related comorbidities have repeatedly been reported to lead to less postoperative weight loss and higher comorbidity-related mortality [40,41,42]. As weight loss after bariatric surgery is not always associated with health-related quality of life, predicting the increase in quality of life after bariatric surgery is a welcome algorithm in the process of expectation management and shared decision-making, preoperatively [43, 44]. Neural networks have shown a mean squared error of 0.035 in predicting the postoperative health-related quality of life 1, 2, and 5 years after bariatric surgery, indicating an accurate estimation, since the mean squared error was close to 0. This neural network model might provide the opportunity to improve postoperative care and rehabilitation for patients undergoing bariatric surgery. However, due to missing patient information, the generalizability of this model might be uncertain. Missing data could be solved by imputation, as this was done in the study of Tseng et al. [45], in which machine learning models were used to predict acute kidney injury after cardiac surgery. One study predicted the presence of hiatal hernias. The importance of hiatal hernia (HH) present at the time of bariatric surgery remains controversial but is increasingly recommended to be corrected simultaneously with the laparoscopic sleeve gastrectomy [46]. Nevertheless, gastroesophageal reflux symptoms may worsen or persist, and a secondary operation with conversion from sleeve gastrectomy to LRYGB may be necessary [47]. The foreknowledge of the presence of HH may both influence the patient and surgeon in decision-making towards LRYGB and predict a longer operation time. However, as the authors of this study mention, the accuracy of the models developed is not impressive and the study should be regarded as proof of concept, exploring the possibilities with AI.

Due to the missing external validation in most studies, the first step for future studies in bariatric surgery should be the inclusion of external validation cohorts to gain more generalizability of machine learning models. Afterwards, clinical trials should be conducted to facilitate the implementation of ML models within bariatric surgery. For both steps, large amounts of data are required for the training process of these models. This data could be retrieved from available patient databases or robotic surgery, eventually facilitating the training process of machine learning [48, 49].

This review has revealed that machine learning models have potentials to predict postoperative complications, weight loss, end-organ complications, quality of life, and preoperative diagnosis. After the necessary steps to improve generalizability and clinical validation, machine learning models may have a significant impact on decision-making within bariatric surgery. As machine learning models are improved and validated, surgeons could be one step closer to achieving personalized decision-making for patients undergoing bariatric procedures.

To use machine learning models for the prediction of surgical outcomes in bariatric surgery, data from laparoscopic bariatric surgery should be accessible [50]. Laparoscopic videos of bariatric procedures could be collected to serve as a training database for machine learning models. By providing accurate image navigation during surgery, anatomical landmarks and unexpected intraoperative findings such as adhesions and abdominal wall hernias could be identified efficiently by machine learning models [51]. In addition, perioperative data collected from anesthesiologists could be collected such as continuous blood pressure measures or oxygen saturation as factors possibly predicting postoperative complications. Furthermore, as robotic surgery is often performed in bariatric surgery, machine learning models could also improve the performance of robotic surgery by providing 3D mapping during surgery and evaluating surgical skills afterward [52].

This review has several limitations. External validation cohorts seem to be missing for most studies, indicating the uncertainty of machine learning models. Therefore, big data from clinical settings are required to achieve appropriate generalizability and accuracy for machine learning models [53]. Additionally, due to the presence of inconsistencies in reported accuracies and AUCs, a meta-analysis could not be conducted.

Conclusion

In this review, promising predictive capabilities of machine learning have been discovered within bariatric surgery. Machine learning has predominantly been used for prediction of postoperative complications and weight loss. However, ML algorithms have mainly been applied to datasets without external validation. To overcome this problem, additional data from large patient databases, laparoscopic surgery, or robotic surgery should be used. By validating ML models, the clinical implementation of ML will be facilitated.