1 Introduction

In recent times, the majority of the population has ignored dietary principles and hygiene. Therefore, it is anticipated that gastrointestinal system diseases would become more prevalent. The majority of digestive system problems are attributed to Gallbladder (GB) diseases. Indeed, the primary role of the GB is to temporarily store bile, formed by the liver and used to aid digestion. Hence, any increase in bile production or an imbalance can lead to the formation of stones.

According to researches and statistics, the occurrence of GB disease, ranging from gallstones to cancer [1, 2], is very common. It affects roughly 10% of China’s adult population, in which, 12–15% of patients are suffering from gallstones linked with choledocholithiasis [3]. Furthermore, cholelithiasis disease impacts over 20 million Americans per year [4]. Moreover, gallstones often go undiagnosed as many patients are asymptomatic [5]. Those that develop symptoms could have a range of clinical manifestations such as airsickness, jaundice, intestinal uneasiness and cholecystalgia. For this purpose, a significant number of medical procedures have been carried out to easily avail treatments to those having gallstones (linked with choledocholithiasis). Besides, according to the Canadian Cancer Society (CCS), 19% of patients diagnosed with GB cancer survive for at least 5 years. However, only 4% of them with stage 4 survive their cancer for 5 or more years, compared to 50% of those diagnosed with stage 1. Consequently, early detection and diagnosis of diseases are crucial for effective treatment planning and patient care.

The diagnosis of GB diseases raises many challenges due to the complex nature of the organ and the wide range of pathological conditions involved. With the rapid growth of medical science and technology over the past years, AI methodology provide significant progress for disease diagnosis and medication development and discovery [6]. For example, AI has modernised the laparoscopic surgical view examinations that are frequently used in the processing of gallstones [7].

Medical imaging technology represents the appropriate basis diagnostic tool of GB diseases [8]. Indeed, prominent diagnostic methods, such as Ultrasound Images (UI), Computed Tomography (CT) and Magnetic Resonance Imaging (MRI), have been widely used, but they often require extensive manual analysis and they need valuable interpretations. Recently, Machine Learning (ML) and deep learning (DL) techniques have shown marked ability in analysing medical images to assist healthcare professionals in detecting and classifying various GB abnormalities. These approaches have shown encouraging potential in improving GB diagnosis by automating the analysis process and enhancing diagnostic accuracy based on large datasets and sophisticated algorithms [9].

In this survey paper, we aim to provide a comprehensive overview of the state-of-the-art ML and DL techniques employed for GB diagnosis. We will explore various methodologies with their application in GB image classification, segmentation, and disease detection and then examine their strengths, limitations, and future prospects. By evaluating the existing literature and identifying key trends and challenges, this survey paper aims to contribute to the understanding and advancement of GB diagnostic methodologies that leading to improve patient outcomes. Thus, it gives the opportunity to help researchers working in this field explore AI techniques related to GB disease in one place by highlighting successful applications, identifying gaps in the current research, and suggesting potential areas for future exploration.

In this paper, we categorized the techniques using AI in GB disease into two groups: (1) ML techniques, and (2) DL techniques. We have divided the paper according to the techniques used.

In the next section, the methodologies that we have used in selecting the paper are presented. ML and DL techniques are introduced in Sect. 3. The studies that have used ML and DL techniques for GB disease are discussed in Sects. 4 and 5, respectively. The limitation of each study is discussed and the conclusions are presented in Sect. 6.

2 Methodology

To conduct a valuable review of AI techniques for GB diseases diagnostics, some questions have been raised: What AI techniques are currently being used in the diagnosis and treatment of GB disease? How have these techniques been used in different types of GB diseases? How effective have these techniques been in improving the accuracy of GB disease diagnosis? What are the challenges and limitations associated with using AI techniques in the context of GB disease? And what is the potential for future development and application of AI techniques in the field of GB disease?

On this basis, we conducted the literature searches using various searches on several search engines to gather relevant information such as Google Scholar, IEEE explore, ACM, Springer, MDPI and the Scopus database. We kept our search focussed on publications of well reputed journals and conferences. This research extensively reviewed the published research studies on GB diagnosis using the following AI techniques, especially, ML and DL approaches. Actually, we focussed on peer-reviewed studies that were published in English within the last 10 years. We excluded non-original researches like opinion pieces and case reports, inaccessible full text papers and studies that utilise Artificial Intelligence (AI) for purposes other than diagnosis of GB diseases.

ML and DL are popular topics, but the number of papers that have been published on the subject the number of papers is not very large. The initial search procedure produces 60 research paper, after exclusion step we obtained 50 papers. Abstract of selected papers is examined which lead to extraction of further44 research papers. Actually, the search expressions were defined as: (1) (“machine learning” OR “deep learning”) AND (“gallbladder”), (2) (“machine learning”) AND (“gallbladder”), and (3) (“deep learning”) AND (“gallbladder”). The search process is based on three steps: (1) Identification, (2) Screening, and (3) Eligibility. The search expression was entered into the database during the identification process. The outputs of the search expression were checked against the scope of this paper in the screening stage. The title, abstract, results, and conclusions were evaluated during the eligibility process. As shown in Fig. 1, a total of 44 articles were chosen for this paper once the above-mentioned procedures were completed. Out of 44 papers, 27 used ML approaches and 17 DL approaches. The complete selection process is illustrated in Fig. 1. And the classification diagram of the used papers is shown in Fig. 2.

Fig. 1
figure 1

Flowchart of the methodology

Fig. 2
figure 2

Classification diagram of the used papers

3 Machine Learning and Deep Learning Techniques

3.1 Background of Machine Learning

ML, a subfield of AI, enables the development of algorithms and models that allow computers to learn and make predictions and decisions. The concept of ML dates back several decades, but recent advancements in computer science, the availability of large amounts of data, and improvements in algorithmic techniques have led to significant advances and increased its implementation in various domains.

The unpredictable growth in healthcare-associated data has given rise to a wide range of technological options for the improvement of the treatment received by a patient. ML plays a key role in health-related problems and medical aid, which consists of computerised analysis such as image recording, image interpretation, image-directed health support, and image information recovery, where inadequacy is perhaps immedicable. ML has inadequate communal effects in the medical sector [10]. It allows healthcare providers to enhance doctor and patient interactions. These results have been used in developing numerous applications. Some of the applications provided doctors with modified treatment for patients, keeping records and the ability to schedule follow-up appointments. In the healthcare sector, ML algorithms can deliver advanced results concerning surgery arrangements for patients by recommending the use of a suitable healthcare structure [11]. There is currently a vast amount of information available in the healthcare industry. It includes Electronic Medical Records (EMRs), that may contain data in an organised or unorganised form [12]. An organised record provides a group of figures and modules in addition to the patient’s loads and even common indications for example abdominal discomfort, uneasiness and headache. The other types of health information categorised as unorganised data contain many dissimilar records, pictures, audiovisual records and final reports. These algorithms are also beneficial in recognising complex outlines within a large record collection. This benefit is particularly compatible with medical features, especially for those individuals who depend on innovative post-genomics dimensions. Generally, there are four elementary approaches to ML algorithms:

  1. (a)

    Supervised learning: the algorithm learns from labeled data, where each data point is associated with a known target. The algorithm learns to generalise from the labeled samples and to make predictions or to classify new data points. Examples of supervised learning algorithms include Logistic Regression (LR), Decision Trees (DT), support vector machines (SVM), and Neural Networks (NN).

  2. (b)

    Unsupervised learning: the algorithm learns from unlabeled data, where there are no predefined targets to discover hidden patterns, structures, or relationships within this data. The clustering algorithms such as k-means clustering and hierarchical clustering are unsupervised learning algorithms.

  3. (c)

    Semi-supervised learning: This ML algorithm uses both labeled and unlabeled data. Semi-supervised learning aims to make better use of the available resources and potentially reduce the need for large amounts of labeled data, which can be expensive and time-consuming to obtain. Its effectiveness deeply depends on the nature of the problem and the quality of the available labeled and unlabeled data.

  4. (d)

    Reinforcement learning: Industry experts usually apply reinforcement methods to explain how a computer can execute a multi-stage method for which there are clearly defined instructions. Information experts program a method to finish a process and provide it with good or bad hints when it figures out how to complete an operation.

3.2 Background of Deep Learning

DL, a subfield of ML, focuses on training Artificial Neural Networks (ANN) with multiple layers to learn and make predictions or decisions. It is inspired by the structure and functioning of the human brain, specifically the interconnected network of neurons that process and transmit information. Actually, in the 1950s, neural systems played a strong role in AI. Methodical approaches to guide NN on the starting point of a development termed “Back Propagation” were introduced in the 1980s. Implementing the bottomless layer of NN required sequence illustrations, but those proved inadequate due to fundamental enhancement difficulties and the limitations of the computer hardware operations. As a result, exploration in ML for the next few years has been redirecting towards other methods such as DT and kernel approaches. Although ANN has been used for a whilst, three important aspects have allowed the preparation of Deep NNs in recent years: (1) the obtainability of high amounts of characterised information, (2) low-cost and authoritative equivalent figuring hardware, and (3) enhancements in practices, performances and structural design. For handling images, a DL approach called Convolutional Neural Network (CNN) has become foremost. When the winning access in a yearly Intercontinental image organization used a Deep Convolutional Neural Network (D-CNN), or simply DNN, to extract a startling presentation innovation matched with outmoded mainframe methods in 2012, the CNN gained significant consideration [13]. Therefore, DNNs are designed to automatically learn hierarchical representations of data by progressively extracting higher-level features from raw input. Each layer learns to transform its input data and passes the transformed data to the next layer [14]. This hierarchical feature extraction allows the model to learn complex patterns and representations, leading to improved accuracy which benefits many applications such as health-related applications [15].

4 Machine Learning Techniques Used for Gallbladder Disease

4.1 Gallbladder Cancer

ML uses AI to initiate analytical representations more successfully than larger data collection that utilizes the recognition of unseen forms within the conventional method. To diagnose GB diseases, ML has been using different forms of information including medical, molecular, demographic, pathological set and radiological. ML techniques in combination with pre-treatment CT scans were used by Liu et al. [16] to estimate the chance of survival from GB carcinoma. For validation purposes, they have included 141 confirmed patients with GB cancer. To extract the tumour signature, they used the LIFExprogramme. The model was then optimised using the Least Absolute and Selection Operator (LASSO) and the Random Forest (RF) approach. The overall accuracy of 95% was achieved by the proposed method. Ciecholewski et al. [17] demonstrated the usefulness of the Ada Boost algorithm in ultrasonography images to detect the GB abnormalities such as lithiasis and polyps. In the proposed study, a rectangular input image area of a given length was handled by the classifier. If the diameter of split areas is substantially larger than the expected diameter on the input, then wavelet approximation of input images can be utilised. In the best situation, the algorithm correctly diagnosed lithiasis with a precision of 91%, polyps with a precision of 80%, and polyps with lithiasis with a precision of 78.9%. The Ada Boost method’s classification findings for lithiasis appear to be promising. Tsilimigras et al. [18] used machine-based learning methodologies such as multivariate analysis to find the ideal and minimal number of Lymph Nodes (LNs) in GB cancer patients. They included 6531 GB cancer patients in the proposed study and were able to consistently detect at least four LNs. Chen et al. [19] suggested a computer-aided diagnosis method that combined an ultrasound image segmentation technique with Ada Boost algorithms and Principle Components Analysis (PCA) to distinguish between non-neoplastic and neoplastic GB polyps. The proposed study achieved an accuracy of 95% in identifying the GB region and was also accurate in diagnosing GB polyps. Muneeswaran et al. [20] detected the segments of the GB in ultrasound pictures by using the tree seed optimization approach. They employed speckle reduction and feature extraction before executing the tree seed optimization approach. They have validated the optimal classifier in real-time clinical datasets with cholelithiasis and cholecystitis. The intrinsic distinction of the proposed intelligent classifier was investigated using conventional assessment parameters. Park et al. [21] tried to locate the GB and reported that the location, form, and size of the GB in ultrasonography might alter depending on the measurer’s expertise. The proposed study produced good results on trained data, but not on unseen data. Therefore, for unseen data, they have used SVM to locate the GB. A total of 750 candidates from 90 sonograms were gathered to train SVM and for each candidate, five brightness-related features were estimated. Based on the experimental results, the proposed approach correctly located 83 out of 90 GBs. Geng et al. [22] proposed a model for predicting survival chances in patients with GB cancer. The proposed study also tried to determine the importance of adjuvant therapy. A total of 818 patients with curatively excised advanced GB cancer were chosen for the proposed study. The tree-augmented Naive Bayes (NB) method is used to build a survival prediction model that depends on Bayesian Network (BN) and composite important measures are used to rank the relevance of survival parameters. The BN models achieved an Area Under the Curve (AUC) of 77.72%, and an accuracy of 69.67%. Multivariate analyses were utilised by Downing et al. [23] to predict survival chances in patients with early-stage primary GB cancer. The proposed study employed 3209 patients. In some circumstances, the proposed strategy was believed to improve survival odds in individuals for those the GB cancer was detected by chance. The BN was adopted by Wu et al. [24] to predict survival chances in patients with GB cancer. They used 628 patients to create a nomogram model and a BN model based on the independent prognostic factors. Different metrics were used to compare the performance of the nomogram and the BN model. The nomogram and BN model had AUCs of 78.22% and 84.14%, followed by internal validation and model accuracy of 72.17% and 75.65%, respectively. The AUC for the nomogram and BN model in external validation was 70.19% and 76.46% with model accuracy of 60.25% and 66.88%, respectively. Zhang et al. [25] established the least number and ideal range of LNs to evaluate patients with curatively resected GB cancer to maximise survival time. The appropriate range of extracted LNs was determined using the BN model. The confusion matrix and Receiver Operating Characteristic (ROC) curve were used to evaluate model accuracy. In this trial, a total of 1268 individuals were enrolled. The ROC for the BN model was 78.49% and the accuracy was 72.82%. Zhou et al. [26] created a patient-specific model for a surgical training system whilst using a semi-automatic approach for segmenting 3D GBs from CT images. The proposed study was divided into three steps. In the first step, the authors used voxel classification. In the second step, the SVM classifier was used to recover the GB area from a single 2D slice in the intermediate part of a GB. In the third step, the recovered GB contour was projected to neighbouring slices for automatic re-sampling and learning. This method was repeated until all GB-containing slices had been processed. They have used 18 CT datasets to test the approach. The proposed approach was efficient and promising, with an average volume overlap error of 15.56% and a surface distance of 0.64 mm. The RF approach was utilised by Zhang et al. [27] to find the difference in preoperative features between GB cancer and undetected GB cancer. They used the chi-square test to examine the differences between two groups and isolate the various contributing factors. Following that, the RF approach was proposed to create a classification model with an AUC of 0.7310. Finally, the findings of the two approaches revealed the presence of biliary calculi, cholecystolithiasis history, GB polyps and other clinical aspects of cholecystolithiasis. SVM was used by Gloger et al. [28] to recognise GB regions in volume data from Magnetic Resonance Cholangiopancreatography (MRCP). They have created GB form space by deriving 3D GB shape attributes. For fine segmentation, a region-based level set technique was used. The technique produced segmentation results with mean Dice coefficients of 0.917 in non-contrast-enhanced sequences and 0.904 in secretin-enhanced sequences. Zhou et al. [29] aimed to develop a diagnostic prediction model for differential diagnosis and clinical decision-making based on clinical data and radiological characteristics of Xanthogranulomatous cholecystitis, a rare benign chronic inflammatory disease of the GB. They used LR and a RF approach to validate relevant clinical data and radiological characteristics. Subsequently, three models were established: (1) CT/MRI models, (2) CT and MRI model and (3) Diagnostic prediction models. The CT/MRI model achieved a mean accuracy of 0.906, whereas the CT and MRI models achieved mean accuracies of 0.837 and 0.842, respectively. Similarly, the diagnostic prediction model obtained a mean AUC and accuracy of 0.888 and 0.898, respectively.

Park and Kim [30] employed SVM in conjunction with a significance test to improve GB discrimination in ultrasonograms. They employed SVM to classify valid features in the suggested technique. They claimed that their proposed method was 96.67% accurate. Ciecholewski [31] segmented the shape of the GB from 2D ultrasound data using gradient vector flow and active contours models. They have used these models to approximate the GB’s margins in the proposed investigation. They also removed any image fragments that were outside of the GB contour. Their proposed method produced a Dice similarity coefficient of 81.8%. With the use of morphological and geographical data, Yuan et al. [32] established a method for detecting faux and real GB polyps. In the proposed study, 96 patients with GB polyps were included. Out of the 96 individuals, 55 had cholesterol polyps, and 41 had GB tubular adenomas. They also used SVM and differential diagnosis to classify intrinsic features. The proposed method had an AUC, accuracy, sensitivity and specificity of 0.898, 0.875, 0.885 and 0.857, respectively.

4.2 Gallstone and Liver

Urman et al. [33] worked on the bile canaliculi and bile ducts and showed them in the GB using a ML technique. They used a ML process that included multiplying artificial statistics with genuine information characteristics, collecting metabolites and proteins, and analysing them with NN. They also identified regions of proteins (n = 5) and lipids (n = 10) using NN processes and classified patients with and without GB cancer efficiently. To determine if they could predict the occurrence of an enlarged GB with a gallstone, Samant and Agarwal [34] examined ML models and the diagnostic usefulness of iridology for diabetes. They modified the automated technique to investigate the link between diabetes and eye tissue colouration abnormalities. Spann et al. [35] reviewed the literature on ML for the treatment of liver transplantation and hepatology. They provided an overview of the concepts and limitations of ML technologies as well as their potential applications in hepatology and medicine. The main goal of ML linked to robot technology in the healthcare sector was represented by Daghottraand Jain [36]. They also reported robotic GB surgery, which was found to be incredibly beneficial to patients as it allowed them to return to work much faster than traditional medical procedures. Raji and Chandra [37] used United Network Organ Sharing Office (UNOSO) to propose several layers of one-dimensional ML algorithm in the aspects of liver transplantation in an ANN structure and evaluated the existence following the laboratory test with 99.74% accuracy. They have also linked the projected structure's operation to that of the current structure, confirming that the proposed structure was built with greater assurance than other structures. To predict liver illness, Nahar and Ara [38] employed various tree approaches. The primary objective of this function was to examine the presentation of numerous different result tree approaches. They also compiled current information on liver disease therapy from various parts of the world. Sontakke et al. [39] discussed two ML algorithms that can aid in the identification of liver infections by improving analytical control. They suggested methods for increasing the productivity of this methodology. El-Shafeiy et al. [40] proposed a ML algorithm based on hybrid categories to be used as a sorting structure for liver treatment to improve working and specialists’ ability to classify treatment probabilities and make sensible recommendations for future use in analyses and healthcare. Jackson et al. [41] used a ML approach to prioritize biomarkers in GB cancer. For this purpose, the authors collected 80 tissue samples from different clinics and hospitals. They also used feature engineering to reduce the dimensionality of these tissue samples. Subsequently, they used SVM and RF on the data. The techniques were believed to work better than others. Cotter et al. [42] used preparative variables to stratify the different prognostic groups having GB cancer based on a ML approach. They believed that using ML technologies to describe patient prognosis may enable doctors to provide patients with more individualised therapy.

We found that using ML technologies and methodologies in the healthcare sector could enhance GB disease diagnosis. The types of relative ML technologies and methodologies used in the above-reported papers were CT scans, AdaBoost algorithm, multivariate analysis, ultrasound image segmentation technique, PCA, SVM, tree-augmented NB, BN, RF approach, LR, 2D ultrasound data, morphological data, geographical data and robotic technology linked with ML. ML methods have been used recently in hepatology to study the radiologic, wealth of medical, pathologic, and biological information of cholecystitis-related liver disease. It is expected that utilising these technologies to decode the difficulty in liver disease diagnosis could develop comparatively finest molecules and healing approaches. Moreover, more accurate clinical methods are required to diagnose GB diseases.

The summary of the above-mentioned studies is presented in Table 1 and Fig. 3.

Table 1 ML literature summary
Fig. 3
figure 3

Summary of the ML studies

According to Table 1, the highest achieved accuracy rate in ML was 99.74% with a multilayer ANN structure. The lowest accuracy rate in ML was 69.67%, which was achieved through the NB method. All the used algorithms show relatively good results but the selection of the most suitable algorithms depends on the specific disease to detect, the available data and other factors. Indeed, for all GB disease types SVM algorithm is the most used ML algorithm and it is particularly effective especially when dealing with detecting or absence of the disease. RF presents good results for feature measures especially when dealing with high dimensional data that allow to understand different features in diagnosis. LR also provide insight into the importance of features detection in the diagnosis of GB diseases. Combining two or more algorithms real more better results.

The major issue with all these studies is the small number of data which may not have a good result even the ML algorithm used is performant and the non-publicity of data that can lead to a variety of dataset and so a variety of results.

To conclude, future work in ML should concentrate on integrating the system with a large number of datasets from diverse clinical contexts to improve accuracy level and to provide more available open-source data.

5 Deep Learning Techniques Used for Gallbladder Disease

5.1 Gallbladder Cancer

DL utilizes a layered methodologic structural design to evaluate figures. DL automatically uses medical imaging methods such as CT scans, LC, and MRI scans for GB diagnosis. Therefore, DL supports clinicians to evaluate the disease and provides patients with the appropriate treatment. Chang et al. [43] investigated the utility of a Backpropagation Neural Network (BPNN) and Genetic Algorithm (GA) in the detection and prognosis of tumour markers in GB cancer patients. The experimental group consisted of 446 individuals having GB cancer. Out of 446 patients, 279 had benign GB disease with 188 being healthy. They believed that GB cancer had links to the CA125, CEA, CA242 and CA199 tumour markers. The proposed method demonstrated 87.49% specificity and 91.72% sensitivity in detecting CA199 and CA242. Using ultrasonography, the DL-DSS (DL-Decision Support System) was created by Jeong et al. [44] for the treatment of malignant GB cysts. They have included 535 patients in the proposed study. The transfer learning was used to create the binary classification CNN model. The AUC of the DL-DSS was found to be 0.92. Based on the DL imagining technique of Laparoscopic Cholecystectomy (LC), Loukas and Schizas [45] suggested a DL methodology for the evaluation of GB line internal vascularity. They also employed computer vision to assess GB wall vascularity. The proposed study included a CNN for patch classification whilst depending on two ground-truth annotation schemes, i.e., (1) 3-classes, and (2) 2-classes. Apart from this, three prominent classifiers with a large number of hand-crafted descriptors were utilized. For 3-class and 2-class classification, the performance of CNN was found to be best, with an accuracy of 83% and 98%, respectively. Moreover, the mean F1score was determined to be 80.4% and 98%. Using computer vision analysis of images from LC operations, Loukas et al. [46] reported a Multiple-Instance Learning (MIL) technique for assessing GB wall vascularity. In the proposed approach, a dataset of 181 GB pictures from 53 procedures was used. The authors compared the suggested MIL technique based on variational Bayesian inference against other state-of-the-art approaches. The proposed method achieved an accuracy of 92.1% and 90.3% for the first and second tasks, respectively. Based on sonographic GB imaging, Zhou et al. [47] developed a DL approach for biliary atresia evaluation. The model demonstrated a specificity of 93.9% and a sensitivity of 93.1% at the patient level. The DL techniques can help radiologists improve biliary atresia analysis in a variety of medical settings, particularly in underdeveloped areas having a shortage of specialists. Loukas et al. proposed a DL approach for assessing GB wall vascularity from the pictures of LC [48]. The authors used 800 patches and 181 region outlines of the GB wall from the Cholec 80 video collection. In the proposed study, two professional surgeons used two labelling systems to annotate the GB areas and patches of (1) 2 classes, and (2) 3 classes. For patch classification of 2 and 3 classes, the suggested model achieved an accuracy of 94.48% and 83.77%, respectively. The best model achieved an accuracy of 91.16% for 2 classes and 80.66% for 3 classes of the GB wall regions. Kim et al. [49] used DL algorithms to distinguish genuine polyps in ultrasound pictures, particularly GB polyps less than 20 mm in size. In the proposed study, a total of 501 patients with GB polyp were enrolled. An ensemble model containing three CNN models and fivefold cross-validation was used to assess abdominal ultrasound pictures of GB polyps. True polyp diagnosis was achieved with an accuracy of 83.63% and an AUC of 0.8960 with the help of an ensemble model obtained from ultrasound pictures. The diagnostic performance of the ensemble model increased with an AUC of 0.9082, specificity of 88.35% and accuracy of 87.61% after incorporating polyp size and patient age information. For the identification of GB leakage during LC, Gerkema et al. [50] used a DL method. In the proposed study, they enrolled 172 patients and the Cholec 80 dataset with procedures conducted at the Meander Medical Centre to create videos. The video data was then converted into video frames. The proposed study included 62,380 no bile leakage and bile leakage images. An optimal bile leakage detection method was created using two CNNs and various parameter configurations. The proposed model achieved a specificity of 80%, a sensitivity of 83% and an AUC score of 0.91. Yao et al. [51] utilized a DL model for feasibly recognising gallstones whilst using massive amounts of data from Internet of Things (IoT). They also created CNN for the acquired imaging record’s functioning features.

5.2 Gallstone and Liver

Adegun et al. [52] investigated utilising a DL method for health-related photos. They also presented a medically upgraded Fully Convolutional Network (FCN) U-Net approach of image analysis to cure disorders like brain tumours, retinal problems and skin cancer. The overall accuracy rate was 90%. Furthermore, this indicated that using fuzzy approaches, the photos might be well pre-processed. DL was utilized by Reza et al. [53] to automate liver segmentation. They also claimed that their research was the first to recommend a CNN-based mechanised subdivision technique for the liver in National Health Portal (NHPs) with acute bacterial issues. Zeng et al. [54] presented an SFNet for manually creating clinical remarks of health photos. They introduced two development models to improve the efficiency: (1) the lesion area detection model and (2) the clinical statement generation model. Their research supported the use of DL in computer-assisted physician analysis. They also added 23 studies involving the modification of lesions in clinical statements made by software. For stomach picture recognition, Rehman and Khan [55] proposed DL algorithms. They proposed searching for enhanced image development and executing threshold-friendly algorithms. An emphasis was placed on the record’s limitations, which leads to a never-ending duty in health imaging, in addition to looking into modern CNN models. Through ultrasound scanning, Reddy et al. [56] established a unique structure for accurately organising the abdominal organs. Current popular working representations include VGG (Visual Geometry Group), GoogleNet, ResNet, AlexNet and Inception. For the ResNet-50 constructed model, the recommended architecture achieved 98.77%. They also addressed the drawbacks of multi-class grouping with a single label. According to Ker et al., numerous DL strategies were investigated to better the physiology and morphology of the human body and disorders [57]. They also analysed CNN’s advancement in a health-related subject. The major goal was to provide a more structured biological review of the mentioned growths present in medical images. Obaid et al. [58] used DL approaches and sonography GB images to detect bilalry artesia. They utilised four different DL models (ResNet152, VGG16, InceptionV3, MobileNet). The MobileNet outperformed others in terms of accuracy (97.87%), specificity (97.51%) and sensitivity scores (98.18%). Fujita et al. [59] used DL techniques to diagnose GB tumours. This study used CT imaging to improve the therapeutic success of the surgery and to differentiate between xanthogranulomatous cholecystitis and GB cancer. The average sensitivity, specificity, and accuracy in the validation dataset were found to be 98.8%, 98.0% and 98.5%, respectively. In addition, the ROC’s AUC was determined to be 0.9985%.

The studies mentioned above used several DL approaches to improve GB disease detection. In this study, we focussed on some DL approaches including BPNN, GA, DDL, computer vision, MIL, CNN models, IoT coupled with DL, FCN and the U-Net methodology. These algorithms are now more frequently employed in hepatology to investigate choledocholithiasis, acalculous biliary dyskinesia and other symptoms associated with liver disease. It is likely that using these DL algorithms to decode the difficulties of liver disease would increase the recognition of more optimal healing treatments. However, a more accurate clinical method for GB ailments will be required in the future. The summary of the above-mentioned studies is presented in Table 2 and Fig. 4.

Table 2 DL literature summary
Fig. 4
figure 4

Summary of the DL studies

According to Table 2, the highest achieved accuracy rate in DL was 98.77%, where they established a unique structure using VGG, GoogleNet, ResNet, AlexNet and Inception. The internal vascularity of the GB line was determined using the CNN-approved approach. The CNN technology produced the lowest accuracy rate in DL with 83%. To attain the accuracy level, they used SVM, K-Nearest Neighbors (KNN) algorithms and Nearest Neighbors (NB) along with CNN. Thus, DL with transfer learning gave results better than the classic deep learning with CNN especially when using a large data. Succeeding work in DL could concentrate on more complicated settings, such as NASNet and DenseNet, as well as picture pretreatment techniques and progressive CNN models. It should also concentrate on developing a model.

6 Discussion

Our study aims to compare the efficacy of different ML and DL techniques in diagnosing GB diseases. Indeed, we explore various ML and DL techniques. These techniques range from DT and ensemble models to CNNs and DNNs. The assessment of these techniques is based on their prominence in the literature and their potential applicability to GB disease diagnosis. This research provides a thorough analysis of published studies utilising ML and DL-based AI to detect GB illness. In total, 44 studies from 2010 to 2022 is included in this study and all of them adhered to the AI concept whilst using various approaches.

Our study shows variable performance across different techniques and datasets. Generally, DL techniques, particularly DNNs, exhibited superior performance in image-based datasets, thanks to their ability to automatically extract and learn features. ML techniques, on the other hand, showed robust performance on structured data, like clinical records, where feature engineering played a significant role. However, these results were dependent on the quality and size of the datasets. The results reveal the importance of selecting the right technique for the right type of data. It also highlights the need for large, high-quality datasets for training and testing the models. As there was no open-source database most of the research relied on different databases.

Despite the broad range of studies reviewed and the insights gained from this survey, there are several limitations to our work. First, the rapid pace of advancements in the field of deep learning for GB disease diagnosis means that newer research may not have been included in this review. Second, the studies included in this review used a variety of ML and DL techniques, methodologies, and performance metrics, which makes it too hard to conduct a direct comparison of their results. Moreover, many of the reviewed studies rely on relatively small and homogeneous datasets, which may limit the generalizability of the findings. The performance of deep learning models is heavily dependent on the quantity and quality of the training data, and the models may not perform as well when applied to different populations or more diverse datasets. Also, many studies did not provide sufficient details about datasets, the models’ architectures or the hyperparameters used, which can make it difficult for other researchers to reproduce the results. Finally, deep learning models, in general, suffer from a lack of interpretability, which can make it challenging for clinicians to trust and adopt these models in a clinical setting.

Looking forward, there are several promising directions for future research in this area. One important area is the development of larger, more diverse, and publicly available datasets which could help improving the generalizability of these models and facilitate more direct comparisons between different techniques. Actually, based on this study, Obaid et al. [60] built a huge dataset composed of 10,692 ultrasound images classified into nine classes of GB-related diseases and used deep learning approaches to analyze and classify these images to detect nine GB disease types with specificities and sensitivities of more than 90%.

7 Conclusion

Nowadays, GB diseases are very widespread, hence, accurate detection of the disease is crucial. Our literature review provided a comprehensive assessment of published studies using AI based on DL and ML to diagnose GB diseases. The use of AI is not intended to replace doctors, but instead to facilitate their decision-making process and accuracy in diagnosis. In this study, 44 publications from 2010 to 2022 were included. The papers were evaluated and assessed based on their performance metrics. Even though there were several academic papers on GB disease diagnosis but only limited numbers investigated the best algorithms for the definitive result.

Both ML and DL have been explored and gave good results in terms of GB disease diagnosis. Whilst DL has demonstrated its potential in GB disease diagnosis, it is important to note that ML techniques can still be valuable in certain scenarios, especially when labeled data is limited or interpretability is a priority. The choice between DL and ML approaches should be made considering availability of data, the specific task and interpretability needs of the GB disease diagnosis task. Besides, through the comparison of various studies, the survey identifies the most accurate and efficient AI techniques currently in use according to the nature of disease and the data used. This information can guide future research and application of AI in medical diagnostics in general and for GB diagnosis in particular and lead to advancements in the field and thus better outcomes for patients. Moreover, the survey bridges the gap between the fields of A and medicine. This promotes interdisciplinary understanding and collaboration, which is often key to innovation. Finally, this survey contributes to the broader efforts to integrate AI into the healthcare sector in general and to revolutionize many aspects of medicine, from diagnosis to treatment planning. Successfully, this survey might be a valuable resource for academics interested in this field of study as well as for specialists in the domain.

Further improvement can be possible with the help of new image recognition technologies. This will create more accurate and rapid diagnosis which can improve patient care and overall outcomes. Another key direction is the development of more interpretable deep learning models. This could involve techniques like attention mechanisms or layer-wise relevance propagation, which can provide more insight into the decision-making process of the model. Furthermore, future works should focus on developing hybrid models that can leverage the strengths of both ML and DL. Ultimately, the integration of these advanced techniques into clinical practice requires not only technical advancements but also careful consideration of ethical and regulatory aspects.