3D-semantic segmentation and classification of stomach infections using uncertainty aware deep neural networks

Wireless capsule endoscopy (WCE) might move through human body and captures the small bowel and captures the video and require the analysis of all frames of video due to which the diagnosis of gastrointestinal infections by the physician is a tedious task. This tiresome assignment has fuelled the researcher’s efforts to present an automated technique for gastrointestinal infections detection. The segmentation of stomach infections is a challenging task because the lesion region having low contrast and irregular shape and size. To handle this challenging task, in this research work a new deep semantic segmentation model is suggested for 3D-segmentation of the different types of stomach infections. In the segmentation model, deep labv3 is employed as a backbone of the ResNet-50 model. The model is trained with ground-masks and accurately performs pixel-wise classification in the testing phase. Similarity among the different types of stomach lesions accurate classification is a difficult task, which is addressed in this reported research by extracting deep features from global input images using a pre-trained ResNet-50 model. Furthermore, the latest advances in the estimation of uncertainty and model interpretability in the classification of different types of stomach infections is presented. The classification results estimate uncertainty related to the vital features in input and show how uncertainty and interpretability might be modeled in ResNet-50 for the classification of the different types of stomach infections. The proposed model achieved up to 90% prediction scores to authenticate the method performance.


Introduction
An ulcer is a form of the gastrointestinal (GI) tract; about 10 percent of people have this condition. It is inflammatory chronic erosion or sore on the internal portion of the mucous skins [1,2]. Itself ulcer is not fetal, but its symptoms are of serious ailments, i.e., Crohn's ailment and the ulcerative of colitis might cause death at a complication stage [3]. Stomach ulcers are sores in the lining of the stomach and the duodenum. Up to 4 million peoples develop stomach ulcers in the United State per year (i.e., 1 out of 10 people) [4].
Conventional imaging protocols for ulcers are sonde and push endoscopy [5]. In the inspection process, it is entered into the anus or mouth of the patients by the experienced doctors to analyze the GI tract [6]. The traditional methodologies performed a vital role to analyze the lower and upper ends of GI [7]. Wireless capsule endoscopy (WCE) is an alternative method to offer painless, non-invasive, and direct small bowel inspection [8]. Commercially accessible WCE comprises the optical dome, part of illumination, batteries, and imaging sensors [9,10]. WCE captures 2-4 images per second for nearly 8 h within the GI patient's tract and transmits them wirelessly and placed in a machine connected to the patient's waist [11]. Physicians can download and examine all photographs off-line for diagnostic purposes [12]. WCE creates approximately 55,000 images across each patient, in which 5% of images are normal from the whole collected WCE images, However, for physicians, it is a time-consuming and exhausting assignment [13]. Thus, it is 1 3 important to develop an automated approach to analyze the ulcer images, and the physician's workload is reduced [14]. The texture features [15] play an essential role in differentiation among the healthy/ulcer images. The Bidimensional EMD (BEEMD) method is used to classify the normal/ulcer images [16]. Curvelet-based lacunarity (DCT-LAC) technique, multi-level super-pixel approach are used for ulcer detection [17].
Detection of stomach infections at an initial stage may help to reduce the risk of mortality. Manual stomach infection evaluation is a laborious and time-consuming task in contrast to computerized methods used for the analysis of stomach. Stomach lesions segmentation and classification are performed using conventional and deep learning methodologies. Hand-crafted features are selected in classical approaches, whereas deep learning methods can learn to extract informative features in the pipeline. Although, a considerable amount of work has been done in this domain, while accurate stomach lesions detection is still a challenging [18]. This research work is based on two phases, in Phase-I deeplabv3 is utilized as a base model of the pretrained ResNet-50 model to overcome the existing limitations. For accurate segmentation, the model is trained by selecting the hyperparameters after extensive experimentation. While, in Phase-II Resnet-50 model is trained using input images and classification outcome are analyzed using uncertainity based on the thresholding and Bayesian neural network to authenticate the prediction accuracy. The foremost contribution steps of the proposed model are as follows: Phase I: Deeplabv3 with pre-trained ResNet-50 model for features mapping are developed for precise lesion

Related work
Much work has been devoted in developing an automated approach for ulcer detection [18][19][20][21][22]. Some latest existing techniques are discussed in this section. The stomach ulcer segmentation is a big challenge because endoscopy images having low contrast, illumination, and brightness issues, thus the infected region is not segmented accurately [23]. The classification of the different types of gastrointestinal infections is also an intricate task [24], because it relies on the feature's extraction framework which directly impediment the classification accuracy [25]. An automated system has been presented to process the WCE images for early detection [26], where features are used and combined into a single vector and subsequently fed to the classifiers for ulcers/ bleeding classification [27]. This approach achieved an accuracy of 92.86% and 93.64% on bleeding and ulcers respectively [28]. Another approach has been presented for the detection of infection in the stomach and achieved an accuracy of 98.3% [29]. A method has been trained using the pre-trained ResNet-101 model and extracted features are optimized using the grasshopper method and passed to SVM for classification of different types of infections in the stomach such as a polyp, bleeding, and ulcer [25,30]. The input images quality has been improved by applying a contrast enhancement and classical deep features [31] are extracted and best are selected using entropy [32][33][34]. These best features are passed as an input to the classifiers, in which KNN outperform as compared to other classifiers and achieved 99.42% accuracy [25]. The deep features have been extracted from transfer learning models such as AlexNet and Google Net for ulcer classification [35]. A saliency-based segmentation approach has been employed for ulcer segmentation    [36]. The Hidden Markov Model (HMM) has been applied for the detection of stomach ulcers on two datasets [37].
An automated system has been presented which comprises the transformation of HSI, YIQ color [38,39], and features fusion using singular value decomposition, [40] and finally classification is performed based on extracted features [41].  The square least saliency transformation with the probabilistic fitting model has been employed for the classification of stomach ulcers [42]. The weakly supervised neural model has been utilized for stomach ulcer detection. The extracted features from [43] VGG model and transferred as input to the classifiers for gastric ulcers classification [44]. Classical deep model has been utilized for stomach ulcer classification on 5560 images of WCE into ulcers, erosions/normal classes and its achieved accuracy of 90.8% [45]. The CNN model has been employed for the classification of different types of stomach lesions such as ulcers, bleeding and polyps with 72 and 71 percent specificity and sensitivity respectively [46]. The GDP network has been utilized for stomach ulcer classification with 88.9% accuracy [47]. HA network with the residual model have been employed for stomach infection. The model achieved 91% accuracy [48].
In literature, extensive studies have been performed for the detection of different types of stomach infections; however, still, there is a gap in this domain because stomach lesions appear in a variable shape and size [14,48,58]. The selection of the learning parameters i.e., optimizing function, learning rate and batch-size of the CNN models is still a challenge that directly affects the classification accuracy. Pre-trained models such as Google net and Alex net are trained on the stomach ulcer datasets on 0.01 learning rate that does not provides satisfactory classification outcomes [35]. The MCNet does not provide accurate lesions segmentation due to unclear boundaries among the infected and the healthy regions [49].

Proposed methodology
A modified model is presented for gastrointestinal infections detection. The technique comprises the two major phases as manifested in Fig. 1. In phase I, the infected stomach region is segmented with ground truth using a modified semantic segmentation model, whereas, a pre-trained ResNet-50 model is presented for the classification of different types of gastrointestinal infections such as Bleeding, Ulcer, Polyps, and normal stomach images.

Semantic segmentation of stomach ulcer
In the proposed model DeepLab v3 + network [51], is utilized as a bottleneck, in which CNN utilized encoderdecoder, skip connections, and dilated convolutions. The ResNet-50 is used as a head network of the deep labv3 for stomach infection segmentation. The semantic segmentation model comprises the 206 layers, which includes 01 input, 62 convolutional, 65 batch-normalization, 32 ReLU, 02 crop2d, 01 max-pooling, softmax, and pixel classification. The layered proposed semantic model is depicted in Fig. 2.
The hyperparameters for a proposed semantic segmentation model are given in Table 1. Table 1, presents the model building hyperparameters such as 100 epochs, SGD optimizer, 0.001LR, and 16 batchsizes are utilized for model training due to maximum accuracy. Figure 3, shows the segmented stomach lesions with ground masks.

Uncertainty estimation based on ResNet-50
In the medical domain, disease grading classification through computerized systems is much helpful for the gastroenterologist at the same time it has become complicated due to the increase in the size of the patient's data. Currently, a convolutional neural network performs a vital  where t i and s i denotes label and CNN score of each class(C). The 60:40 ratio is utilized for model training and testing. The description of the model with the number of layers and selected neurons are mentioned in Table 2.
The model training is performed on selected hyperparameters as mentioned in Table 3 (1) Categorical cross entropy = − C ∑ i t i log(s i )

Performance improvement via uncertainty-aware stomach infections classification
The uncertainty of the classification model is used for estimating the prediction in two ways (i) estimate the probability based on thresholding (ii) probability estimation based on the Bayesian neural network.
In this method, randomly complete data is split into training and testing parts. The threshold value is computed across each class label.
In BNN, given a dataset (D) = x n ∈ R D , y n ∈ R C } N n=1 where x n represent input feature vector and y n denotes the one-hot encoded label vector. The predictive BNN on a new sampled {x * , y * } might be where W represent weights, p(y * |x * , W ) denotes softmax function by f W (x * ) and p(W|x * , D) shows posterior over weights. p(y * |x * , W ) shows network forward pass. The predictive distribution by Monte Carlo as defined as: where predictive distribution might be computed through forward pass of a model T running with drop out employed to produce predictions T and computes standard deviation over softmax T samples outputs. The BNN utilized dropout for sampling to posterior predictive distribution that is referred as Monte Carlo dropout.
The predictions of all stomach infection test images are performed and sorted through their related uncertainty predictions. On the different uncertainty levels, predictions are conducted for diagnosis and compute the prediction accuracy at the specified threshold according to the class labels.

Dataset descriptions
In Table 4, proposed method performance is computed on five benchmark datasets such as a privately collected imaging dataset having 30 WCE videos, where 10 ulcer videos, 10 bleeding videos, and the remaining 10 videos are healthy [41]. Each video contains 500 frames. The CVC-Clinic DB database contains 612 WCE images with annotated ground truth [53]. The Nerthus dataset contains 21 WCE videos with 5525 number of frames [54]. The kvasir-segmentation dataset comprises 1000 WCE images with ground-masks [55]. The kvasir-classification dataset contains 4000 images of 8 classes, where each class contains 500 images of different types of stomach infections [56].

Experimental results and discussion
For evaluation of the efficiency of the proposed system, two experiments were carried out. Experiment#1 is done to compute the proposed segmentation model performance with ground annotated masks. Experiment#2 is implemented to analyze the classification results. The overall experiments are implemented on MATLAB 2020RA toolbox with coreI7 CPU, 32 GB RAM, and 8 GB Nvidia graphic card 2070 RTX.

Experiment#1 evaluation of semantic segmentation
The proposed semantic segmentation method performance is evaluated with ground annotated masks as given in Table 5. The segmented stomach lesions are computed with truth annotations masks on three benchmark datasets such as kvasir, private collected images, and CVC-CLINIC. Figures 4, 5, 6 shows that the proposed method more precisely segments the stomach infections. The proposed segmentation results are compared on the same benchmark datasets as mentioned in Table 6. Table 6, shows the existing methodologies for segmentation of stomach infections such as [9,41,52,53,59,66]. In the comparison analysis, the FCN method has been employed with 8, 16, and 32 fully connected layers, in which FCN-32 s achieved the highest 0.83, mean accuracy [57]. Seg-network [58] and dilation model [59] obtained 0.85 segmentation accuracy, while the U-net model [60] has been employed with different pre-trained networks such as VGG-16, VGG-19, and ResNet-34 [61]. Without any combination, only the U-net model achieved 0.86 mean segmentation accuracy, which is maximum compared to other pre-trained models. MCNet [62] model is employed for lesion segmentation with 0.84 mean accuracy. Comparison reflects that in the proposed model, deeplabv3 is used as a backbone of the ResNet-50 model and it has attained 0.98 mean accuracy which is also superior compared to recent all published work in this domain.
The proposed segmentation results are also compared with the U-net [60] model, the visually segmentation results as seen in Fig. 7. Figure 7 results show that, on U-net segmentation model false positive rate is increased due to the segmentation of non-lesions pixels, while the proposed segmentation model (deeplabv3 & ResNet-50) segment the actual stomach lesions more precisely as compared to other models.

Classification results on Nerthus-dataset-frames
The classification of four different types of stomach infections such as Grade-1 Bowl, Grade-2Bowl, Grade-3Bowl, and Grade-4Bowl. The classification results of different   Table 7 and Figs. 8, 9. Table 7, shows the probability of the prediction scores based on thresholding, where overall achieved accuracy is 1.00. The precision rates of the Bowl grades are 1.00, 1.00, 0.99, 0.99 on Grade1, Grade 2, Grade3, and Grade 4 respectively. Similarly, in the same experiment, the prediction rate has computed using BNN as presented in Table 7 and Figs. 10, 11.

Classification results on kvasir-classification dataset
The classification results are computed on eight different kinds of stomach infections. The classification results are also computed on uncertainty-based thresholding and BNN as given in Table 9, 10 and Figs.12, 13.
The classification results on kvasir dataset using BNN as given in the Table 8 and Figs. 14, 15.

Classification results on private collected images
The classification results on private collected are manifested in Figs. 16, 17 and Table 11. The proposed method achieved testing accuracy of 0.982 and 0.10917 loss rates.
On private collected images, classification is performed into three classes such as bleeding, healthy, and ulcers. The method achieved cumulative accuracy of 1.00, however, the precision rate of each class obtained 1.00 on healthy, bleeding, and ulcer, respectively. Classification results on uncertainty based on BNN are illustrated in Figs. 18,19 and Table 11.
On private collected images, BNN achieved 0.98 prediction scores on three classes such as healthy, bleeding, and ulcer. Similarly, 0.98 precision rate on healthy, bleeding, and 0.99 on ulcer class. The classification outcomes are compared with recent approaches as stated in Table 13.
The comparison of the proposed classification results is performed with eight existing methodologies, where features extraction, selection, and fusion approaches have been used for the classification of normal and bleeding WCE images with 0.94 accuracy. The pre-trained AlexNet model has been utilized for the classification of ulcer and erosion [63]. The classical Gabor features with pre-trained dense-Net has been employed for cancer/normal images classification [64]. Deep features are extracted from WCE images for classification of normal/ulcer [65]. The classical and deep features are fused and informative features are selected for Polyp, ulcer, esophagitis, bleeding, and normal images classification [22]. The duo-deep model has been employed for the classification of ulcer, polyp, bleeding [66]. StomachNetwork has been utilized for the classification of Polyps, Ulcer and normal images with 0.96 accuracy [67]. The proposed method classified the esophagitis, colon polyps, normal, bleeding, and UC (ulcers) images with the highest prediction scores compared to existing works.
After the comparison analysis, we conclude that, no methods exist in the literature for segmentation and classification of the different types of stomach infections by using all publically available challenging datasets and private collected images. This research work investigates a new approach for segmentation and classification of the different types of stomach infections, where kvasir-Seg, CVCClinicDB and private collected images with groundmasks are used to compute the segmentation model performance. Whereas for classification four classes (bowl-grade1, bowl-grade2, bowl-grade3, and bowl-grade4) of Nerthus dataset, eight classes of the kvasir dataset and three classes (polyp, bleeding and healthy) of the private collected images are utilized. The classification outcomes indicate that the proposed methodology is superior to current works as compared to existing current works, which authenticate the proposed method contribution.

Conclusion
In this research, a new approach is presented for analysis of stomach infections but It is a difficult job using WCE because lesions having irregular shapes and sizes. The informative features extraction is still a challenge because it's reduced the classification accuracy. Accurate segmentation is performed using a deep semantic segmentation model, where deeplabv3 is employed as a bottleneck of the ResNet-50 model. The proposed modified model segments the stomach lesions in terms of different measures such as 0.97 meanIoU, 0.98 global accuracy, 0.96 weighted IoU, 0.98 mean accuracy and 0.98 mean BF-score on CVC-Clinic DB, whereas 1.00 meanIoU, 1.00 global accuracy, 1.00 weighted IoU, 1.00 mean accuracy and 1.00 mean BF-score on Kvasir-SEG, while 0.99 meanIoU, 0.98 global accuracy, 0.99 weighted IoU, 0.99 mean accuracy and 0.99 mean BF-score on Private collected images. The prediction scores of the classification models across each benchmark dataset are computed using uncertainty based on the standard thresholding and Bayesian neural network (BNN). The uncertainty based on thresholding the proposed approach attained an accuracy of 1.00 on private collected images and Nerthus dataset, while 0.96 using BNN on Nerthus frames. Similarly, on the same experiment, kvasir dataset achieved accuracy of 0.87 using uncertainty based on thresholding and 0.64 using uncertainty based on BNN. In the future accuracy on kvasir dataset might be further improved to enhanced the prediction rate of stomach infection.
Funding No funding was received from this research work.

Declarations
Conflict of interest All authors declare that they have no conflicts of interest to report regarding the present study." Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. 1 3