Introduction

The number of people diagnosed with cancer is rising at an alarming rate, and it is the leading cause of death worldwide. The GLOBOCAN 2018 study found that 18.1 million people were diagnosed with cancer for the first time and 9.6 million people died from the disease in 2018 [1]. Japan, China, and Korea have the highest cancer mortality rates in East Asia. In terms of whether or not stomach cancer will be diagnosed, the pre-diagnosis stage is crucial [2]. The American Cancer Society estimates that 27,600 instances of stomach cancer in adults will be detected in the United States this year. There will be 11,010 deaths from stomach cancer this year, both men and women [3]. Industrialized nations will conduct a census on gastrointestinal illnesses in 2018 [4]. About one-fifth of Brazil's adult population suffers from gastrointestinal disorders, the study found. The United States has the highest rate of gastrointestinal illness and infection, at 22%, followed by China (10%), the European Union-5 (21%), Japan (10%), Russia (12%), and Brazil (12%) [4]. A comparative examination of mortality rates owing to GI tract disease is reported in Ref. [5]. The research discovered that with 76.1 deaths per 100,000 people, Hungary has the highest death rate of any of the countries studied. The United States has a mortality rate of 20.5, the United Kingdom is at 22.1, and Iceland is at 11.3. The rising mortality rate from stomach cancer can largely be attributed to its tardy diagnosis. The best way to reduce annual mortality rates is to detect and treat diseases as soon as they develop. The benefit of stomach disease early detection comes in the fact that it allows for more effective and more agreeable therapy to be delivered. Furthermore, non-invasive therapy alternatives can be utilized with early discovery [6, 7]. Diseases that affect the GI tract impact somewhere between 60 and 70 million people in the United States [8].

Endoscopic procedures have evolved throughout time, and WCE is the outcome. In addition to providing non-invasive imaging, WCE also lessens patient discomfort through the use of advanced image capture technologies [9]. WCE has a substantial impact on medical imaging. Given the fact that image integration was originally offered in 2001 [10]. Each WCE capsule measures around 25 mm in length and has a diameter of 11 mm. This WCE capsule, which records RGB images after traveling through the GI tract, has a battery life of eight hours and can record continuously. To always document the patient’s state, the patient wears a device that can take pictures and save them. After that, medical professionals use WCE photographs to make their diagnoses [11]. During a single examination of a single patient, capsule endoscopy (CE) generates over 50,000 individual pictures. Even for an experienced medical professional, the process of going through all the photos and analyzing them takes a significant amount of time and work. In addition, because only about 5% of the photographs in the collection depict sick areas, it is extremely difficult, if not impossible, to recognize, detect, and localize infected regions [11, 12]. If there are an excessive number of patients with gastritis, there is a greater risk of faulty abnormality detection, which leads to a larger percentage of false positives (FP) [13]. A new area of computer vision-based medical imaging is the identification, analysis, and categorization of infected areas utilizing WCE pictures. It may be difficult to detect cancer from stomach frames. Many factors can lead to incorrect classification, including a lack of contrast and illumination, texture diversity, body position, and camera defects such as insufficient focus [14].

To date, several different machine learning (ML) [15, 16] algorithms have demonstrated that artificial intelligence (AI) is capable of successfully automating the diagnostic process for different diseases. In addition, the term “ML” is used to describe models that can learn and draw conclusions based on massive amounts of data. An AI system may perform tasks that require human intelligence, such as speech recognition and translation, by calculating and predicting based on the input it receives. DL refers to a collection of ML algorithms that focuses on the extraction and categorization of visual data. This type of ML has shown significant potential in a wide range of applications, particularly in the field of health care [17, 18]. Using images, DL may more accurately predict and identify various diseases, such as breast cancer [19], liver disease [20], colon cancer [21], brain tumor [22], skin cancer [23], lung cancer [24], pneumonia [25], and most recently, the COVID-19 diagnostic [24]. No human involvement is required in DL. Learning by simplifying the input data as the network grows more complicated is the major benefit of adopting DL techniques (not like classical ML classifiers). As a result, the model can autonomously gather data and generate more accurate insights. Instead of using linear functions to define features, DL techniques use nonlinear functions in a combinatorial form to attain better levels of model precision than typical ML algorithms.

According to studies [13,14,15,16,17,18,19,20,21,22], some studies apply DL classifiers to classify GI illnesses using WCE pictures. The majority of these studies employed CNN with small datasets to identify and detect GI bleeding from WCE images, as described in [24,25,26,27,28,29,30,31,32,33,34,35]. In addition, research [24,25,26,27,28,29,30,31] has concentrated on identifying GI bleeding and differentiating it from health conditions and disorders that affect the stomach. To make matters even more difficult, the authors of Reference [35] assert that the photos received using WCE contain several inaccuracies. One of the problems is that we do not have an adequate amount of control over the movement of the camera and the movement of the organ [27]. When using WCE pictures, the diagnoses of ulcerative colitis, polyps, and dyed-lift polyps can overlap. This is especially true if the individual making the diagnosis has less understanding or if the patient's history is difficult to get. This is one of the problems with WCE imaging. To ensure a reliable diagnosis of any one of these three disorders, the creation of an automated system is essential. Yet, there has not been a classification model for detecting these three disorders of the GI tract, which is what encouraged us to design such a DL model in this study. Because of this, a multi-classification model that is based on DL approaches is exploited to classify GI disorders with WCE images. This model takes advantage of the benefits that are provided by DL methodologies. This study gives a full explanation of each architecture, and the best of them is determined based on the results, which may reach greater detection accuracy.

In summary, the key contributions of the present work are stated as follows:

  1. (1)

    This work proposed four novel CNN architectures i.e., Vgg-19 + CNN, ResNet152-V2, GRU + ResNet-152 V2, and ResNet152V2 + Bi-GRU for classifying four different types of GI diseases such as ulcerative colitis, polyps, dyed-lifted polyp’s, and normal from WCE images.

  2. (2)

    Using data augmentation techniques, copies of the WCE image data were generated by introducing horizontal flip, rotate, right and left shift, vertical flip, and skewing.

  3. (3)

    The experimental evaluation of the proposed DL classifiers has been performed on the four benchmark publicly accessible datasets, and the results have been compared with modern state-of-art methods.

  4. (4)

    When compared to the other three DL classifiers, the Vgg-19 + CNN demonstrated exceptional performance in terms of accuracy (99.45%), recall (98.90%), specificity (99.75%), MCC (0.9810), and F1-score (98.84%).

The following is the current work's structure: The section "Related Work" presents the recent related work of DL classifiers for the detection of ulcerative colitis, polyps, and dyed-lifted polyp diseases. The section "Materials and Methods" shows the materials and methods, which include the datasets chosen, data pre-processing, proposed DL models, and performance metrics. Extensive experimental data and discussion are presented in the section "Results and Discussion". The findings of this study as well as recommendations for further research are presented in the section "Conclusion".

Related work

WCE images showing ulcers, hemorrhages, and polyps present the greatest challenge for medical professionals when attempting to differentiate between healthy and infected subjects in an image. It is feasible to classify data by utilizing image processing, ML, DL, and computer vision. To classify WCE images, several different strategies have been proposed and put into practice. The primary focus of the discussion in this section is the categorization of infected and healthy photos. A survey of the relevant research reveals that the classification of WCE images can be accomplished with one of three primary classes of feature extraction algorithms: This category encompasses handcrafted, DL, and hybrid approaches to data collection and analysis.

Xing et al. [36] developed a method that involves three stages for the identification of bleeding. At the beginning of the processing step, tasks such as keyframe extraction and edge removal are carried out. After that, superpixel color histogram (SPCH) features that are determined by the principle color spectrum are applied to differentiate between the bleeding frames, and a KNN classifier makes the ultimate determination. To separate the bleeding patches at the superpixel level, color feature vectors of nine dimensions are recovered from the different color spaces. A new computer-aided automatic detection approach for the diagnosis of GI illnesses was proposed by Attique et al. [37]. To recreate the ulcer region from the WCE video, use the color information and compute using the LHS method CFb. Following the extraction of CNN features from the DenseNet model comes the optimization process, which uses Kapur's entropy. The most important features are input into the MLP classifier, which ultimately achieves an accuracy level of 99.50%. Jia et al. [38] offer a method that uses deep CNNs to automatically recognize bleeding in the GI tract. The performance of bleeding detection is examined using a large WCE dataset that contains more than 10,000 photos with annotations. A completely connected neural network (NN) with eight layers has three convolutional layers, two fully connected layers (FCL), and three pooling layers. It is assumed that there will be a total of 100 examples in each batch, and the learning rates, momentum, and weight decay parameters will each be set to 0.001, 0.9, and 0.004 accordingly. The detection of bleeding frames is finished with the addition of an SVM classifier to the second FCL. The authors of [39] extracted DWT, DCT, and CNN features using the Vgg16 model. These characteristics were used to categorize images. The most desirable qualities are chosen with the assistance of a genetic algorithm (GA) that is built on KNN. To evaluate the proposed method, a dataset that had a precision of 96.5 percent was produced by integrating four other datasets that were already in existence. Meshan et al. [40] developed a strategy that was based on recurrent CNNs. They were able to attain a 99.13 percent accuracy on a cubic SVM classifier by combining the grasshopper (GH) technique mask RCNN with the minimum distance fitness function.

According to Sekuboyina et al. [41], WCE images can be automated by first dividing the image into multiple patches and then using a CNN to extract characteristics that are relevant to each block of the image. This strategy not only mitigates the downsides of hand-crafted characteristics but also increases the utility of those characteristics. They were able to attain a sensitivity of 71.19 percent while at the same time achieving a specificity of 72.33 percent. Ghosh et al. [42] trained CNNs by using SegNet layers with three different classes. After the two halves of the endoscopic image have been divided, the training network is utilized to locate the bleeding spots within the image. When working with a large number of color planes, HSV is the color space that provides the best results. The algorithm attained a worldwide accuracy of 94.42 percent when it was evaluated using a clinical dataset that was readily accessible to the public. The small bowel red lesions were detected and segmented by Coelho et al. [29] using a DL U-Net architecture. Using the Suspected Blood Indicator (SBI) and cutting-edge techniques, this U-Net was analyzed in an annotated sequence. U-net outperformed the competition in terms of lesion detection and segmentation. By 1.78 percent, it outperformed current methods in detecting lesions. An SVM-based technique that can identify bleeding using chrominance moments and uniform LBP as a color texture feature applies to a sample size of ten patients, according to research that was conducted by Li et al. [43]. Transfer learning (TL) was put to the test for the diagnosis of GI bleeding in [44], using an ImageNet-trained V3 model. The percentage of CNN training sets that contained positive samples was raised by using resampling to improve the algorithm. According to Diamantis et al. [45], WCE images can be utilized to aid in the diagnosis of polyps, as well as bleeding and ulcers that may be present in the digestive tract. To automatically identify ulcers and erosions, a CNN model referred to as the single-shot multi-box detector (SSD) was utilized for the investigation [46]. MLP and CNN models are utilized [47] in neural networks to automatically identify bleeding spots. The authors of the paper [48] suggest a deep cascade network for the detection of anomalies in WCE images. This network would make use of a CNN model (Fast-RCNN), TL, and a variety of other modalities. In [49], they used the CNN-based model for the classification of normal and malignant ulcers. The three CNN architectures that were employed in this method were ResNet, VggNet, and InceptionV3, and they were all pre-trained on ImageNet before being used.

Previous studies have demonstrated the need for a multiclassification method that is significantly more accurate than existing classifiers in terms of accuracy (ACC), sensitivity (SEN), specificity (SPC), area under the curve (AUC), and F1 score to distinguish healthy WCE images from those that contain ulcerative colitis, polyps, intestinal bleeding, and dyed-lifted polyps disease. To accurately classify the WCE photos, a total of four DL models are utilized, including Vgg-19 with CNN, ResNet152V2 with GRU, and ResNet152V2 with Bi-GRU. Developing an algorithm that is capable of learning and extracting the most important information from the photos to classify the image into one of three GI disease classes: ulcerative colitis, polyps, and dyed-lifted polyps is the key problem of this work. The results of several recent research on diseases of the digestive tract are compared in Table 1.

Table 1 Existing research on diseases of the GI system

Materials and methods

Ulcerative colitis, polyps, intestinal bleeding, and dyed-lifted polyps are just some of the common diseases that have the potential to affect the structure of the human GI tract. WCE images are used for the diagnosis of these diseases because they play a significant role in the process. DL, a subfield of AI, is currently being applied to the process of disease diagnosis to improve accuracy. Diagnoses of Ulcerative colitis, polyps, intestinal bleeding, and dyed-lifted polyps are all made with the use of an algorithm for DL that makes extensive use of classifications. The goal of the study is to construct a DL model for identifying these diseases utilizing four distinct models. To the best of our knowledge, this study can be considered the primary work toward offering a single DL model for the classification of a collection of GI disorders. It eliminates the need for medical practitioners to use a variety of applications to classify each ailment individually, saving them time in the process. Figure 1 depicts the proposed model's block diagram.

Fig. 1
figure 1

Block diagram for the classification of GI diseases

The three stages of the model are depicted in Fig. 1; these stages include pre-processing, feature extraction, and classification. Within the framework of the proposed model, there are four distinct categorizations: normal, ulcerative colitis, polyps, and dyed-lifted polyps. The model makes use of the WCE images of the GI tract that were provided as input. In the first step, pre-processing techniques are carried out. These include data augmentation and the random division of the data into three sections: training, validation, and testing with proportions of 70%, 10%, and 20%, respectively. To resize the pixel value of the photos to the interval [0, 1], the data normalization technique is employed, which converts the WCE image to an array of pixels. The extraction of features is the second step in the process. Image categorization into their respective classes is the third phase, which is accomplished through the application of several DL models, which will be covered in the following headings. Normal, ulcerative colitis, polyps, and dyed-lifted polyps are all types of input WCE images for the model that we have presented. These input WCE images are resized to the resolution of 299 × 299 × 3. To achieve accurate results while avoiding the issue of overfitting, the number of training images has been increased by the use of data augmentation techniques such as horizontal flipping, rotating, right and left shifting vertical flipping, and skewing.

Datasets descriptions

For our study, several publicly accessible databases containing WCE images were collected. In addition to normal images, this collection consists of polyps, ulcerative colitis, and dyed-lifted polyps images that were taken from WCE. A collection of about 500 WCE images was chosen for the study of ulcerative colitis [27]. CVC-ClinicDB is the most often used medical dataset for the classification of polyps with about 612 WCE images [28]. The proposed CNN models have been trained on this dataset to separate polyps from other forms of GI diseases, such as ulcerative colitis or dyed-lifted polyps. In addition to that, the Kvasir dataset is also utilized in the process of collecting images of polyps [27]. The third dataset contains 500 WCE images of dyed-lifted polyps obtained from [24]. Finally, we collected 2325 normal images from Red Lesion Endoscopy Dataset [29]. From all datasets, a total of 4437 WCE images have been retrieved. Figure 2 shows the samples of images of ulcerative colitis, polyps, and dyed-lifted polyps.

Fig. 2
figure 2

Dataset samples of each disease

Figure 3 depicts the total number of patients across all age ranges who were diagnosed with GI diseases. For ulcerative colitis, the ages of the patients were between 42 and 59 years, for polyp's disease, the ages were between 32 and 65 years, for the dyed-lifted polyps dataset, the ages were between 26 and 52 years, and for the normal patients' dataset, the ages were between 28 and 60 years old.

Fig. 3
figure 3

Age-wise distribution related to each disease

Dataset pre-processing

All of the datasets included in this research were normalized by applying several image augmentation techniques, which increased the size of the datasets and made them more comparable. We have accumulated around 4437 WCE images, and after applying various augmentation methods, we have a total of 10,000 WCE images of all GI diseases. In the first stage of the augmentation procedure, the datasets that correspond to the four training categories will be segmented as follows: normal, ulcerative colitis, polyps, and dyed-lifted polyps. After that, the pre-processing stage uses these WCE photos as input images. Normally, this pre-processing step is used to ensure that the input image encounters the CNN-based model’s specifications. For the current work, we utilized many distinct pre-processing processes for the WCE photographs. At first, the photographs taken by the WCE were reduced in size to the fixed size of resolutions i.e., 299 × 299 × 3. Secondly, resized images were increased by using augmentation techniques such as horizontal flip, rotate, right and left shift, vertical flip, and skewing. Afterward, the data normalization method is applied to all WCE images. WCE images are turned into arrays for use in the next phase of the model in this final step. The dataset images are randomly divided into three portions, such as 70% for training, 10% for validation, and 20% for testing, to ensure that the images are diverse enough to train and test the models.

Proposed methods

To create the proposed models for identifying GI disorders, the current research makes use of several different kinds of DL models. Ulcerative colitis, polyps, and dyed-lifted polyps are the three GI disorders examined in the study to determine which proposed method is the most effective. These models incorporate a distinct combination of CNN and RNN layers. Two well-renowned TL classifiers, Vgg-19, and ResNet152-V2 are applied with simple CNN, bidirectional gated recurrent unit (Bi-GRU), and gated recurrent unit (GRU) as forms of RNN [30]. The comprehensive information about each of the four proposed models is presented in the following subsections.

VGG-19 + CNN model

To classify GI disorders with WCE images, a model that consists of a pre-trained Vgg-19 model followed by CNN layers has been constructed. Figure 4 provides a visual representation of the architecture of the proposed Vgg 19 and CNN model.

Fig. 4
figure 4

Proposed Vgg-19 + CNN model

The input WCE images, feature extraction, and classification are all included in the model that has been suggested. An image with fixed dimensions of 299 × 299 × 3 is used as the model's input layer. Vgg-19 is applied for the extraction of the features, which is then followed by two CNN units that serve as feature extraction segments. A convolution (Conv) layer plus a rectified linear unit (ReLU) make up the model's core CNN block. Although there are two Conv layers and two ReLU layers in the second CNN block. The MaxPooling layer comes next, followed by a batch normalizing layer, and finally, a dropout layer, as seen in Fig. 4. To begin the classification process, a 1-dimensional data vector is generated by the feature extraction layer and sent to the flattening layer. There are 512 neurons in each of the classification units' dense layers and a dropout layer. With the thick layers and the softmax activation approach, the output image can be categorized into one of four categories: normal, ulcerative colitis, polyps, or dyed lifted-lifted polyps, depending on the presence or absence of GI disease. Table 2 provides a summary of the suggested Vgg-19 & CNN model.

Table 2 Model summary of proposed Vgg-19 + CNN model

There are a total of 23,773,409 trainable parameters, separated into two categories: 23,773,154 trainable parameters and 255 non-trainable parameters. In contrast to non-trainable parameters, the trainable parameters are updated and optimized during training, whereas the non-trainable parameters are not. In other words, the parameter that cannot be trained will not be used in the classification process.

ResNet-152 V2

As shown in Fig. 5, the second pre-trained ResNet-152 V2 model is applied to extract the feature from the WCE images. This model already incorporates the initial weights, which makes it more effective than a conventional CNN at achieving acceptable levels of accuracy in a shorter amount of time [30, 31]. The architecture of the ResNet-152 V2 model consists of a reshaped layer (9 × 9 × 3), a flattened layer, a dropout rate of 0.20, a dense layer with 256 neurons, and the final dense layer is attached with a SoftMax function to classify the WCE image into its corresponding categories, which are normal, ulcerative colitis, polyps, and dyed-lifted polyps. The detailed ResNet-152 V2 model summary is given in Table 3.

Fig. 5
figure 5

Proposed ResNet-152 V2 model architecture

Table 3 Model summary of ResNet-152 V2 model

The overall number of parameters for the model is 81,288,459, and this number takes into account both trainable and non-trainable parameters which are 81,133,604 and 154,855.

Gated recurrent unit (GRU) + ResNet-152 V2 model

The third pre-trained ResNet-152 V2 model has the capability of expediting the training process and swiftly converging to considerable precision. This feature is made possible by the fact that the model has already been trained. Therefore, including this model before RNN techniques can assist in the development of effective and significant models that produce satisfactory levels of classification accuracy. On the other hand, the GRU is a well-known RNN design that can store data that is inappropriate to the classification for an extended length of time without removing it. In addition, GRU possesses a wide variety of other advantages, such as its ease of use, its versatility, and the less amount of time needed for training. Because of these characteristics, GRU is more suited and more effective for a wide range of applications, including the present one. As shown in Fig. 6, an approach to feature extraction was implemented that was based on the ResNet-152 V2 model in conjunction with GRU. The ResNet-152 V2 & GRU model has the following layers: reshape layer (9, 9, 2048), tune distribution (9, 15,447), flatten layer, dropout of 0.20, a dense layer containing 128 neurons, GRU with 512 units, and in the final dense layer, "SoftMax activation function" is used to classify the WCE into one of the GI diseases classes.

Fig. 6
figure 6

Proposed ResNet-152 V2 model followed by GRU

Table 4 has a comprehensive explanation of the ResNet-152 V2 architecture, which is then followed by the GRU. There are a total of 91,806,574 parameters in the pre-trained ResNet-152 V2 and GRU, with 91,551,719 of those parameters being trainable and 254,855 of those parameters not being trainable.

Table 4 Model summary of ResNet-152 V2 and GRU

ResNet152V2 + Bi-GRU

As depicted in Fig. 7, the fourth and final proposed pre-trained ResNet152V2 model was utilized as a feature extraction method for the current investigation. This was done in conjunction with Bi-GRU. The model includes a reshape layer with the parameters (9, 9, 2048), a dropout layer with the value of 0.20, a bi-GRU layer with 512 units, and in the final dense layer, there is a "Softmax activation function" that is used to classify the WCE image as belonging to one of our four GI classes.

Fig. 7
figure 7

Proposed ResNet-152 V2 model followed by Bi-GRU

Table 5 contains a discussion of the comprehensive description of the ResNet-152 V2 architecture, which was followed by the Bi-GRU. This model has a total of 92,970,253 parameters, of which the trainable parameters are 92,715,398 and the non-trainable parameters are 254,855.

Table 5 Model summary of ResNet-152 V2 and Bi-GRU

Performance evaluation metrics

The accuracy (ACC), sensitivity (SEN), which is also known as recall, specificity (SPC), negative predictive value (NPV) [32], positive predictive value (PPV) [33], which is also called precision, area under the curve (AUC), F1 score, and Matthew's correlation coefficient (MCC) [30] were the metrics that were used to evaluate the performance of the ulcerative colitis, polyps, and dyed-lifted polyps disease classification models. As a direct result of this, a confusion matrix has been established for each of the four models. The ACC is denoted by Eq. (1), and it is the number of WCE photos that can be accurately predicted based on the total number of images.

$$ ACC = \frac{TP + TN}{{TP + TN + FP + FN}} $$
(1)

True positive and false positive parameters are denoted by TP and FP, respectively. TN and FN are the true negative and false negative values. It is used to calculate the SEN as shown in Eq. (2), which is the total number of images that are both positive and projected to be so from the total number of positive images.

$$ SEN = \frac{TP}{{TP + FN}} $$
(2)

Using Eq. (3), we can determine the number of WCE images that are actually and expected to be negative, based on the total number of negative images.

$$ SPC = \frac{TN}{{TN + FP}} $$
(3)

Equation (4) displays the PPV, which indicates the number of WCE images that are both positive and predicted to be positive relative to the total number of images that are projected to be positive.

$$ PPV\, = \frac{TP}{{TP + FP}} $$
(4)

The number of images that were negative and those that were anticipated to be negative are obtained from the total number of images that were predicted to be negative, which is shown by the NPV in Eq. (5).

$$ NPV = \frac{TN}{{TN + FN}} $$
(5)

The F1 score is the harmonic mean of PPV and recall, and Eq. (6) is used for determining this value as follows:

$$ F1 - score = 2*\left(\frac{PPV*SEN}{{PPV + SEN}}\right) $$
(6)

After all, MCC is used to measure how well the classification model is working, which is given in Eq. (7).

$$ MCC = \frac{(TP*TN) - (FP*FN)}{{\sqrt {(TP + FP)*(TP + FN)*(TN + FP)*(TN + FN)} }} $$
(7)

In addition, current researchers advocate the use of a confusion matrix in model validation [34] because it is a reliable method for categorizing data linkages and distributions. It includes additional details on demonstrating the classification classifiers. Hence, a confusion matrix is used to examine our models as shown in Fig. 8.

Fig. 8
figure 8

Confusion matrix for the present study

The parameters declared in the confusion matrix may be derived from Table 6.

Table 6 Description of the parameters used in a confusion matrix

We can define further variables for TP, TN, FP, and FN using these parameters shown in Table 7.

Table 7 TP, TN, FP, and FN variables equations

Results and discussion

Using WCE images, the four distinct DL models i.e., Vgg-19 + CNN, ResNet-152V2, GRU, and ResNet-152 V2 model, and ResNet152V2 and Bi-GRU that have been developed for classifying the GI structure such as ulcerative colitis, polyps, and dyed-lifted polyps. The grid search method was utilized to fine-tune these four DL models by modifying their hyperparameters such as epoch, batch size, and learning rate. All these models were trained up to 100 epochs having a batch size of 32. By using a "stochastic gradient descent" (SGD) optimizer with a momentum of 0.8, the prior learning rate of these DL models was adjusted to 0.05 to maximize accuracy. Learning rates dropped by a factor of 0.1 after 20 iterations of training when there was no progress. The reason behind this is to avoid the models from overfitting [29, 35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66]. The classification performance of the DL models was observed in terms of confusion matrix, ACC, SEN which is known as recall, SPC, NPV, PPV, AUC, F1 score, and MCC.

Experimental process

The DL models used in this study were deployed with the help of the Keras library. Furthermore, the approaches that aren’t directly related to CNN are written in Python. For the present study, the experiment was performed using a Windows-based operating system with an NVIDIA GeForce GTX 11 GB GPU and 32 GB RAM.

Result analysis

Figure 9 represents the training accuracy and validation accuracy of four DL models up to 100 epochs. All these models were run up to 100 epochs. The Vgg-19 + CNN obtained accuracy for training was 0.9947, and that for validation was 0.9910. The training and validation loss of Vgg-19 + CNN was 0.2170 and 0.0839, respectively. The ResNet-152V2 model attained a training accuracy was 0.9901, and that for validation was 0.9899. The GRU and ResNet-152 V2 models achieved the training and validation accuracy was 0.9888 and 0.9799, respectively. In addition, ResNet152V2 and Bi-GRU obtained accuracy for training was 0.9810. These values showed that the Vgg-19 + CNN model trained well and could correctly classify GI diseases versus normal cases.

Fig. 9
figure 9

Training and validation accuracy, training, and validation loss of all DL models; a Vgg-19 + CNN; b ResNet-152V2; c GRU and ResNet-152 V2 model; d ResNet152V2 and Bi-GRU

Several performance metrics were considered for the evaluation of all four DL models used in the present study. Figure 10 presents the confusion matrix for the Vgg-19 + CNN, ResNet-152V2, GRU, and ResNet-152 V2 model, and ResNet152V2 and Bi-GRU. In the independent test set, there were 300 ulcerative colitis, 518 polyps, 450 dyed-lifted polyps, and 1395 normal WCE images. In the confusion matrix, actual cases were ordered in rows, whereas predicted cases were placed in columns.

Fig. 10
figure 10

Confusion matrix a Vgg-19 + CNN, b ResNet-152V2, c GRU and ResNet-152 V2, and ResNet152V2 and Bi-GRU

For the Vgg-19 + CNN, among 300 ulcerative colitis WCE images, the model accurately detected 297 cases and misclassified 3 WCE images as polyps. In polyps, the Vgg-19 + CNN model correctly detected 515 WCE images and misclassified the total of 3 images as normal, ulcerative colitis, and dyed-lifted polyps. Furthermore, this model also accurately classified the 448 images of dyed-lifted polyps, and 2 images were incorrectly classified as ulcerative colitis. For ResNet152V2, the model identified 292 cases and misclassified 3 cases as polyps and 5 cases as dyed-lifted polyps among 300 ulcerative colitis WCE images. In addition, ResNet152V2 also accurately classified the exact class label of polyps having 506 WCE images, 1377 images of normal cases, and 444 images of dyed-lifted polyps. ResNet152V2 + GRU predicted the exact class label of 295 ulcerative colitis WCE images and misclassified 5 cases as normal, polyps, and dyed-lifted polyps. This model also accurately classifies the 509 images of polyps, 1379 normal WCE images, and 447 dyed-lifted polyps WCE images. For ResNet152V2 + Bi-GRU, among 300 ulcerative colitis WCE images, the model detected 288 cases accurately and misclassified a total of 12 WCE images as normal, polyps, and dyed-lifted polyps. Among 1395 normal WCE images, the model detected 1373 images as healthy cases and misclassified 22 cases as ulcerative colitis, polyps, and dyed-lifted polyps. Moreover, 501 polyps and 439 dyed-lifted polyps cases were also correctly identified by ResNet152V2 + Bi-GRU.

The evaluation parameters such as loss, ACC, SEN, SPC, NPV, PPV, AUC, F1 score, and MCC for all four GI diseases are shown in Table 8. As shown in Table 7, the Vgg-19 + CNN achieved 99.45% ACC, 98.90% SEN, 99.75% SPC, 98.98% PPV, 99.74% NPV, 0.9810 MCC, 98.84% F1 score, and loss of 0.2170 for automatic diagnosis of normal, ulcerative colitis, polyps, and dyed-lifted polyps using WCE images. The ResNet152V2 achieved significant results such as ACC of 96.31%, SEN of 96.41%, SPC of 99.40%, PPV of 96.35%, MCC of 0.9580, and loss of 0.3380. For the ResNet152V2 + GRU model, the ACC was 97.19%, the SEN was 97.19%, PPV was 97.22%, and F1-score was 97.09%. Finally, ResNet152V2 + Bi-GRU achieved 95.37% accuracy, 95.16% SEN, 0.9500 MCC, and 95.26% F1 score.

Table 8 Performance comparison of the proposed model with pre-trained models

Table 8 shows that the Vgg-19 + CNN model produced superior results as compared to the three DL classifiers as ResNet152V2, ResNet152V2 + GRU, and ResNet152V2 + Bi-GRU. The reason for this is that ResNet152V2, ResNet152V2 + GRU, and ResNet152V2 + Bi-GRU classifiers are made up of deep networks, and the spatial resolution of the feature map for their final convolutional layer outcomes has been significantly reduced, lowering classification performance. Furthermore, the filter size of these classifiers is not suitable for such a dilemma, crucial features are ignored, and the enormous input connected to the neuron's receptive fields is not produced. The Vgg-19 + CNN model reduces the issues of spatial resolution and overlapping in the infected colored region of the GI bleeding WCE images. This model also includes adjusting the filter size and speeding up the convergence process, as well as reducing the detrimental impact of structured noise and improving classification performance. If the model achieves the highest AU (ROC) value, it is regarded as suitable and effective. The AU (ROC) of the model is determined using the TP and FP values. The AU (ROC) of the Vgg-19 + CNN, ResNet-152V2, GRU and ResNet-152 V2, and ResNet152V2 and Bi-GRU classifiers are shown in Fig. 11.

Fig. 11
figure 11

AU(ROC) curves of a Vgg-19 + CNN; b ResNet-152V2; c GRU and ResNet-152 V2; and ResNet152V2 and Bi-GRU

The AU (ROC) of the Vgg-19 + CNN model was 0.9953. The AU (ROC) values for ResNet-152V2, GRU and ResNet-152 V2, and ResNet152V2 and Bi-GRU classifiers were 0.9598, 0.9907, and 0.9305, respectively. We believe that the Vgg-19 + CNN model would be helpful for clinical experts in classifying ulcerative colitis, polyps, normal, and dyed-lifted polyps diseases using WCE images.

Comparison with state-of-the-arts classifiers

Table 9 presents a detailed comparison of the designed DL models with current state-of-the-art classifiers in terms of many parameters such as accuracy, precision, recall, and F1 score.

Table 9 Comparison of four DL models with state-of-the-art classifiers

Discussion

WCE images are generally used in the diagnostic process for diseases that affect the GI system. It creates a detailed image of a particular region, which helps us to detect irregularities of the stomach in addition to illnesses that are located within. WCE is a more fast and cost-effective approach for the identification of disorders such as ulcerative colitis, polyps, intestinal bleeding, and dyed-lifted polyps than the standard method of endoscopy. This is because WCE does not require the use of a flexible endoscope [30]. These conditions can manifest themselves in any area of the GI tract, from the mouth down to the anus. In the beginning, an endoscopic operation was carried out to examine the large intestine (sometimes referred to as the colon) in addition to the rectum in search of any abnormalities or modifications [77]. During an endoscopy, a thin, long tube with a flexible tip is inserted into the rectum [78]. Endoscopies, on the other hand, are only linked to a small number of potential risks, such as an adverse reaction to the anesthetic that is used during the examination, bleeding at the site where a tissue sample (biopsy) is collected, or the removal of a polyp [79]. As a consequence of this, a WCE process was utilized to examine the stomach. WCE is a treatment that involves taking photographs of a patient's GI tract with the assistance of a handheld wireless camera [80]. The patient is given a vitamin-sized capsule to consume as part of the capsule endoscopy procedure. Inside this capsule is the camera used for the procedure [81]. The patient wears a recorder on a belt around their waist, and the camera that is placed in the capsule passes through their digestive tract, collecting thousands of pictures along the route. These images are then transmitted to a recorder that is worn on the belt. Traditional endoscopy makes it impossible to access some sections of the small intestine, however, capsule endoscopy makes it possible for medical experts to check these areas [82]. An accurate automatic detection classifier was necessary to handle the millions of photos that were acquired by the WCE camera. This was done to identify disorders such as ulcerative colitis, polyps, and dyed-lifted polyps. We can automatically classify GI cases using WCE images due to the application of DL approaches. Thus, we designed four CNN-based models which significantly classify ulcerative colitis, polyps, and dyed-lifted polyps’ WCE images which can be helpful for clinical experts in starting the treatment process for stomach-infected individuals at an early stage. According to the above performed experimental work, the Vgg-19 + CNN model is effectively trained on ulcerative colitis, polyps, and dyed-lifted polyps infection that appears on the GI tract and appropriately diagnoses these infected WCE images. In the comparison of the classification performance of four DL classifiers, the Vgg-19 + CNN achieved remarkable classification accuracy in detecting GI diseases using WCE images, having an accuracy of 99.45%. On the datasets with a fixed size of 299 × 299 × 3 image resolutions, the Vgg-19 + CNN, ResNet-152V2 + GRU, ResNet-152V2, and Bi-GRU + ResNet-152 V2 were trained. In addition, the cross-entropy loss strategy was applied in the training process for each of the four DL models that were used in this work. Table 8 shows a comparison of the classification performance of the various DL classifiers that were investigated for this study. The metrics that are included in this comparison are loss, PPV, MCC accuracy, precision, recall, and F1 score. It is noticed that the Vgg-19 + CNN model has achieved outstanding performance, with the highest accuracy of 99.45%, specificity of 99.75%, F1 score of 98.84%%, MCC of 0.9810, recall of 98.90%, and AU(ROC) of 0.9953. The classification performance of the other three DL classifiers was marginally lowered. In comparison to ResNet-152V2 and ResNet152V2 + Bi-GRU, which had AU(ROC) values of 0.9598 and 0.9305, respectively, the ResNet152V2 + GRU classifier achieved a better AU(ROC) score of 0.9907. ResNet152V2 + Bi-GRU produced the lowest AU(ROC) score (0.9583), accuracy of 95.37%, recall of 95.16% specificity of 98.37%, F1 score of 95.26%, MCC of 0.9500, PPV of 95.35%, and NPV of 98.80% among all the models. In the majority of instances, the decision to use CNN-based pre-trained models did not affect the overall binary classification problem. On the other hand, these models perform significantly better when it comes to intellectual tasks like segmentation or the detection of a wide variety of diseases [83, 84]. In addition, the majority of studies [85,86,87,88] believe that the performance of these classifiers does not improve when the number of CNN layers used for binary classification tasks is increased [89].

Comparisons are made between the classification accuracy of the DL models that were utilized in this study and the modern state-of-the-art classifiers that are presented in Table 9. The evaluation of experimental results using contemporary cutting-edge methodologies demonstrates that the Vgg-19 + CNN model for detecting ulcerative colitis, polyps, intestinal bleeding, and dyed-lifted polyps in WCE images has added a significant amount of output to the health professional's toolkit. This can be seen in the form of improved diagnostic accuracy. Majid et al. [39], Khan et al. [40], Kundu et al. [67], Pozdeev et al. [68], Naz et al. [51], Furqan et al. [56], and work presented by Pannu et al. [71], achieved the overall classification performance of the CNN based pre-trained models and ML classifiers on GI tract diseases using WCE images in term of accuracy was 95.5% (CNN), 98.13% (SVM), 95.09% (Least-square saliency transformation), 88.00% (CNN), 99.30% (ensemble model), 99.3% (CNN), and 95.0% (CNN) respectively. Caroppo et al. [57] developed a deep pre-trained model for the detection of bleeding detection in endoscopy images. This model achieves significant outcomes with an accuracy of 97.71%. The Vgg-16 and CNN model was designed by Park et al. [72] in detecting Gastric infections in endoscopic biopsies from WCE images. This model achieves an overall accuracy of 98.40% in detecting stomach diseases. In ref [73], they used Vgg-16 and SVM to classify ulcer bleeding using WCE images, and they achieved an average accuracy of 98.4%. Li et al. [74], used CNN model for the detection of gastric anomalies. Their model significantly predicts the gastric-infected images with a 90.91% accuracy rate. Owais et al. [75] proposed a DL algorithm based on ResNet and LSTM for the automatic diagnosis of ulcers and Crohn’s using endoscopy images. They were able to obtain accuracy in the classification of 97.05%.

The findings presented in Table 8 demonstrate that Vgg-19 + CNN is more capable of finding the patterns of anomalies and extracting the dominant and discriminative patterns in the process of identifying the variety of stomach diseases from WCE image samples, with a result of accuracy that is 99.45%. Table 8 also discusses the results provided by the remaining three DL classifiers. We offer a comprehensive explanation for why the work that came before us demonstrates poor classification performance, and we do so in conjunction with an examination of the characteristics of the ulcerative colitis, polyps, and dyed-lifted polyps infected WCE images that serve as the subject of the classification challenge. CNN-based pre-trained classifiers initially consist of deep networks, and the last convolutional layers of these deep networks reduce the spatial resolution of the feature map. Because of this, the models' capacities for categorization are reduced. In addition to this, the filter size of these pre-trained classifiers is inadequate, and the reason for this is that the number of neurons coupled to the input is on the high side. As a direct consequence of this, the important portions of the data are disregarded. These issues can be solved using the Vgg-19 + CNN model. The Vgg-19 model was integrated with CNN to detect multi-stomach ailments using WCE images that were gathered from several different databases. The color loss and overlapping that were occurring in the sores section of WCE photos are fixed by using the proposed Vgg-19 + CNN model. This approach enhances classification performance by increasing the rate of convergence while significantly lessening the adverse influence of structured noise. The findings indicate that the Vgg-19 + CNN approach for multi-classification of GI disorders from WCE images, such as ulcerative colitis, polyps, and dyed-lifted polyps, has added considerable and appropriate output in supporting healthcare specialists.

Conclusion

In the current study, multi-classification DL models were developed and assessed for categorizing ulcerative colitis, polyps, and dyed-lifted polyps from WCE images. To our knowledge, this is the first attempt to classify the three GI disorders into a single DL model. It is essential to make an accurate diagnosis of these conditions as early as possible to begin proper therapy and shield patients from potentially serious repercussions. In this study, we presented four different DL classifiers, namely VGG19 + CNN, ResNet152V2, GRU + ResNet152V2, and ResNet152V2 + Bi-GRU. The VGG19 + CNN model outperformed the other three proposed DL models in extensive experiments and findings done on datasets acquired from numerous sources. These datasets included normal, ulcerative colitis, polyps, and dyed-lifted polyps infected WCE images. Based on WCE images, the VGG19 + CNN model achieved an accuracy of 99.45%, 98.90% SEN, 99.75% SPC, 98.98% PPV, 99.74% NPV, 98.84% F1 score, 0.9810 MCC, and 99.53% AUC. Ongoing research is aimed at improving the performance of the suggested method by expanding the number of images in the datasets that are used, increasing the number of training epochs, and employing federated learning techniques in classification.