1 Introduction

Digital image processing is hot research area in the modern academic community due to the vital contribution to the medical field and the humanity as a whole. It is increasingly used in healthcare for extracting useful information about the critical diseases from digital medical images such as CT, X-ray, PET, and MRI, this helps in detecting the disease in its early stages and increases the survival chances of the patents. Image processing and pattern recognition are interlinked by their tasks such as image segmentation, object detection, image classification, and pattern recognition (Goswami et al. 2021). Image segmentation and classification are the most common used image processing methods in various domains such as healthcare. For many applications, image segmentation and classification are essential for comprehending, extracting information, analyzing, and interpreting images. The integration of AI techniques with MIA marks a significant improvement in diagnostic precision in the dynamic landscape of healthcare (Renuka Devi et al. 2022), yet a challenge exists in the understanding of the efficiency and the selection of optimal AI technique for MIA that limits their effective implementation in healthcare.

In healthcare, the issues such as high levels of noise, low resolution, and low contrast, may arise during the acquisition process of medical images such as MRI, CT, and US. Preprocessing is an essential early stage to deal with such issues to enhance the input images quality prior to further processing of high-level stages such as segmentation and classification (Shirazi et al. 2020). It can help in extracting high-value features to improve the outcomes of segmentation and classification (Tahir et al. 2019). Preprocessing methods are widely used by researchers in medical imaging to enhance the performance and accuracy. Many researchers have used various preprocessing methods as essential step in segmentation approaches (Tamilmani and Sivakumari 2019); (H. Liu et al. 2020); (Nawaz et al. 2022); (Y. Feng et al. 2023); (Napte and Mahajan 2023) as well as in classification approaches (Thayumanavan and Ramasamy 2021); (Biswas and Islam 2021); (Shaheed et al. 2023); (Sejuti and Islam 2023) of medical images. Utilizing preprocessing methods in segmentation and classification approaches reported good results in medical image processing.

Image Segmentation is also a significant stage for medical image processing as such images usually have common issues such as large datasets or high-resolution, variability in anatomical structures, and lack of transparency. Therefore, effective and accurate segmentation approaches are highly required to address such issues. It is highly desired to combine the similar pixels of an image to form similar regions, which plays an integral role in quantitative MIA (Monteiro et al. 2020). Although many image segmentation approaches are available in the literature based on traditional, ML, and DL methods such as (Khorram and Yazdi 2019); (Cheng and Wang 2020); (Meng et al. 2020); (Kumar et al. 2021); (Vikhe et al. 2022); (Wen et al. 2023). But still there is a need for new efficient image segmentation approaches to enhance the analysis and diagnosis process of medical images and obtain more accurate results.

Feature extraction is also essential stage in MIA to address the common issues such as the selection of relevant features, Data quality, Overfitting, and Interpretability. This can be achieved by utilizing various feature extraction approaches such as GLCM, LBP, LoG, and PCA. Feature extraction in healthcare refers to the process of identifying and extracting relevant information or features from various medical image modalities such as CT, MRI, PET, X-rays, and US. In the literature, various approaches proposed for medical image feature extraction based on color, texture and shape, and DL such as (Rashighi and Harris 2017); (Liebgott et al. 2018); (Ahmed 2020); (Aswiga et al. 2021); (Madhu and Kumar 2022); (Narayan et al. 2023).

Medial image classification is another significant stage for medical image processing in healthcare to address the issues such as class imbalance, subjectivity in labeling, and overfitting, where these issues can be addressed using various ML and DL approaches. Medical image classification is a process of assigning a label or category to an image using various methods that helps in accurately diagnosing diseases such as cancer, bone fractures, and brain abnormalities. Although various approaches proposed in the literature for medical image classification based on DL and ML models such as (Ramasamy & K, 2019); (Thayumanavan and Ramasamy 2021); (Saikia et al. 2022); (Alnaggar, Jagadale, Narayan, et al. 2022a, b); (Vijh et al. 2023); (Rocha et al. 2023). But still there is a need for new efficient image classification approaches to enhance the MIA process and obtain more accurate results.

In the literature, several related works have significantly contributed to the broader understanding of AI in MIA such as (Altaf et al. 2019); (Sharif et al. 2020); (Xie et al. 2021); (Abdou 2022); (Kshatri and Singh 2023). While existing literature delves into the applications of AI in MIA, a comprehensive review covering significant stages of MIA, taxonomy, and analysis with a dedicated focus on the AI approaches efficiency remains a notable gap. This is the motivation behind this research work, the lack of a structured framework to systematically categorize and evaluate the performance of AI approaches for MIA. As the field rapidly evolves, the need for a deep understanding of computational complexities, scalability issues, performance metrics, and Interpretability becomes increasingly apparent. As a result, the research question emerges as: What are the key efficiency metrics, computational complexities, and scalability considerations inherent in AI approaches for MIA and how can these approaches be comprehensively reviewed, categorized through a structured taxonomy, and systematically analyzed to inform the optimization of healthcare diagnostics, treatment planning, and outcomes?

This taxonomy aims at addressing research question by providing comprehensive review, taxonomy, and analysis of contemporary and efficient MIA approaches across key stages: preprocessing, segmentation, feature extraction, classification, and visualization. It categorizes existing MIA approaches within each stage based on method, employing qualitative analysis based on image origin, objective, method, dataset, and evaluation metrics to reveal strengths and weaknesses. Additionally, a comparative analysis evaluates the efficiency of AI-based MIA approaches using five publicly available datasets: ISIC 2018, CVC-Clinic, 2018 DSB, DRIVE, and EM in terms of accuracy, precision, Recall, F-measure, mIoU, and specificity. Moreover, the existing poplar public dataset and evaluation metrics are briefly described and analyzed. A quantitative analysis is presented based on publisher, publication year, categories, sub-categories, image origin, and evaluation metrics to illustrate the usage trends. This study is novel as it boosts the previous surveys covering major stages of MIA. It presents a broader review, analysis, and taxonomy of recent MIA approaches, poplar datasets, and the utilized evaluation metrics. The aim is to highlight research gaps and issues in the MIA literature, assisting the research community in understanding existing limitations and inspiring the development of new, efficient AI approaches to meet current healthcare needs.

1.1 Paper contributions

Following are the main contributions of the article:

  • Comparative analysis on the most recent existing surveys based on image origin, stage, method, quantitative and qualitative analysis, dataset, and evaluation metric to highlight their strengths and weaknesses.

  • Broader comprehensive review for the most efficient and recent medical image processing approaches.

  • Taxonomy on the most efficient a recent medical image processing approach based on the utilized method.

  • Comparative analysis for the most efficient and recent MIA approaches based on their image origin, objective, methods, dataset, and evaluation metric.

  • Review and analysis for the most popular public used datasets used by medical image processing approaches.

  • Review and analysis for the most common used evaluation metrics in evaluating medical image processing approaches.

  • Summary on the possible research issues and challenges in medical image processing to help in shaping the direction of future research works.

1.2 Paper organization

The structured of this article as follows: currently related surveys are presented and also analyzed in Sect. 2. Then, MIA approaches are reviewed and analyzed in Sect. 3. and Sect. 4 describes the comparative analysis of the prominent methods, while Sect.  5 reviews the poplar used existing datasets. Section 6 describes the commonly used evaluation metrics. The quantitative analysis is presented in Sect. 7. Further, Sect. 8 presents the findings and discussions, the possible open issues and research challenges are briefly highlighted in Sect. 9. Finally, Sect. 10 provides the conclusions and future work.

2 Existing surveys

This section reviews some currently related surveys on MIA approaches and provides a comparative analysis with our proposed taxonomy survey (Altaf et al. 2019); (Sharif et al. 2020); (Xie et al. 2021); (Rashmi et al. 2022); (Zhang et al. 2022a, b); (Abdou 2022); (Arabahmadi and Farahbakhsh 2022); (Suganyadevi, Seethalakshmi et al. 2022a, b); (Kshatri and Singh 2023).

(Altaf et al. 2019) presented a comprehensive overview of the recent advances in DL methods such as CNN and RNN, and their potential for use in MIA in CT, MRI, PET, and ultrasound imaging modalities. They discussed the challenges and potential opportunities associated with DL applications in this domain such as the need for large datasets, the lack of interpretability of DL models, and the potential biases in the utilized data for training such models and highlight some of the promising directions for future research. (Sharif et al. 2020) provided an overview of the currently existing state-of-the-art in using ML algorithms such as SVM, ANN, RF, DL, and others for the detection of tumors in multiple organs and CT, MRI, and PET imaging modalities. They also provide an overview of the benefits and limitations for each imaging modality, as well as the various preprocessing techniques for enhancing the image quality, various evaluation metrics discussed such as sensitivity, specificity, and accuracy.

(Xie et al. 2021) conducted a valuable and informative comprehensive survey incorporating domain knowledge into DL methods for MIA, potentially improving the accuracy and usefulness of results. (Rashmi et al. 2022) provided a methodological review of various traditional and DL-based techniques for analyzing breast histopathological images for diagnostic purposes. (J. Zhang et al. 2022a, b) presented a methodological review of image analysis methods for microorganism counting including both traditional image processing and DL approaches to find common technological points. They also conducted various preprocessing, segmentation, and classification techniques, as well as different evaluation metrics.

(Abdou 2022) conducted a comprehensive survey for various DNNs techniques, especially focusing on CNN to address the issues of segmentation, classification, detection, and automatic diagnosis applied in various medical image modalities: X-Ray, MRI, CT, Ultrasound, and PET for analysis purposes. (Arabahmadi and Farahbakhsh 2022) provided a comprehensive review on different kinds of DL techniques on MRI images for brain tumor detection somehow focused on CNN, as well as the existing challenges followed by potential future directions. (Suganyadevi et al. 2022a, b) conducted a comprehensive methodological review of existing deep-learning approaches in medical image processing and analysis. The primary aim of the study is to delineate and implement fundamental principles for conducting research in the domain of medical image processing. Additionally, the research strategically identifies and addresses pertinent recommendations to enhance the robustness and efficacy of future endeavors in this field. (Kshatri and Singh 2023) presented a survey on various CNNs models in MIA, such as brain MRI, preprocessing, segmentation, and post processing are the major stages of focus, furthermore analyzed current advancements and the associated major challenges.

It is worth mentioning that Table 1 illustrates the availability of numerous reviews and surveys in the field MIA. But these surveys failed to consider all the components of MIA compared to our taxonomy as outlined in the above table. This survey article stands apart from other surveys due to its exhaustive and up-to-date nature, it provides a comprehensive review for the most recent existing approaches to all medical image processing stages: preprocessing, segmentation, feature extraction, and classification. These approaches are further categorized and qualitatively analyzed based on their performance, strengths, and weaknesses. Moreover, a quantitative analysis on the poplar existing datasets and evaluation metrics utilized by MIA approaches. This study also offers a quantitative analysis on the various MIA approaches considering factors such as publication year, sub-categories, and platform.

Table 1 Comparative analysis on the existing surveys for MIA

3 Medical image analysis (MIA)

MIA helps in resolving clinical issues by utilizing various image processing approaches to provide a comprehensive analysis of medical images, and their applications range from diagnosis to treatment planning and monitoring. The MIA process typically involves several stages such as (Preprocessing, Segmentation, Feature Extraction, and Classification). The general MIA process flow depicted in Fig. 1.

Fig. 1
figure 1

The general process flow of MIA

3.1 Medical image preprocessing

Common preprocessing approaches are investigated from different perspectives and grouped into Three major types: Color Space Transformation, Image De-noising, and Image Enhancement each type has different approaches, as depicted in Fig. 2. Figure 3 shows the process flow of medical image preprocessing. Table 2 illustrates the Comparative analysis of medical image preprocessing approaches.

Fig. 2
figure 2

The process flow of medical image preprocessing

Fig. 3
figure 3

Taxonomy of preprocessing approaches

Table 2 Comparative analysis of medical image preprocessing approaches

3.1.1 Color space transformation

Transformation of color space is an essential step in MIA that helps in enhancing the quality of images for better analysis and diagnosis. A color space contains a color model that signifies a pixel value and a mapping method that maps the color to a group of colors that can be performed. RGB is a highly used color space. Commonly, color images are performed by three bands of red, green, and blue. The HSI color space is a popular alternative to the RGB color space for enhancing medical images utilized by (Blotta et al. 2011); (Angelin Jeba and Nirmala Devi 2018); (Yan and Zhang 2019). Whereas, HSV color spaces utilized by (Feng et al. 2018); (Bai et al. 2018); (Li et al. 2020); (Liu et al. 2021) for the enhancement of visually distracting color images, image segmentation, and image classification. While, Lab Color space transformation utilized by (Yang et al. 2022a, b); (Saifullah et al. 2022); (Ignacio et al. 2022); (N. J. Abraham et al. 2022) for shadow removal, embryo egg detection, medical endoscopic image enhancement, fire detection, low light image enhancement, and medical image compression. Overall, color space transformation is a potent instrument in medical image processing that can enhance the precision of analysis and diagnosis.

3.1.2 Image de-noising

The de-noising of medical images corrupted by noise is a common problem in image processing. An image is usually corrupted by noise in the acquisition and transmission stages. Image de-noising techniques are utilized to minimize the noise while maintaining the important information inside the image. The image also gets corrupted during its transmission and usually, the image corrupted by noise isn't simply eliminated in image processing. The changes in the image pixels' value are known as Noise. Medical images have low contrast Compared to other application images. (Mohd Sagheer and George 2020). The most utilized denoising approaches are developed based on wavelets or filters.

3.1.2.1 Wavelet-based approaches

Wavelet-based approaches are widely utilized in medical image de-noising due to their ability to maintain image features while removing noise. The basic idea is to decompose the image into a set of wavelet coefficients and then threshold or shrink them to remove noise. Wavelet thresholding is a common and efficient approach, where the wavelet coefficients are compared to a threshold value and any coefficients that are below the threshold are set to zero. The problem formulation of wavelet thresholding for medical image denoising can be expressed as:

$$y = W(x) * T(W(x))$$
(1)

where \(x\) is the noisy image, \(W\) is the wavelet transform, \(T\) is the thresholding function, and \(y\) is the denoised image, which involves applying a threshold to the wavelet coefficients of an image in order to remove high-frequency noise while maintaining the low-frequency components.

To improve the medical image quality and maintain significant details for better performance in further processing stages such as segmentation and classification (Gupta and Ahmad 2018); (Benhassine et al. 2021) introduced a Discrete Wavelet Transform (DWT) based optimal thresholding of wavelet coefficients approach for reducing noise by adaptively selecting the best decomposition level and mother wavelet. While, (Chen et al. 2019) integrated the DWT with Modified Median Filter (MMF) to address the issues of image noise, blur, and edge loss in the collecting and transmitting processes of medical images. Initially, an improved wavelet threshold was applied for high-frequency coefficients and retained low-frequency coefficients. Then, modified median filtering was applied to three high-frequency sub-bands. The integrated approach is efficient and performed better results than other approaches and able to deal with high-precision medical images containing complex noises. (Chervyakov et al. 2020) applied the DWT approach to examine the influence of quantization noise on the efficacy of DWT filters for 3D medical imaging and to evaluate the effect of quantization noise on image quality. They focused specifically on the impact of quantization noise and does not address other sources of noise or artifacts that may affect image quality. (Elaiyaraja et al. 2022) introduced two denoising approaches based on DWT, namely Gaussian Filter (GF) and Bilateral Filter (BF), for effectively removing Gaussian noise from medical images and color video sequences. The results showed that the proposed approaches outperformed existing methods in terms of visual quality and objective evaluation metrics such as VIF, IQI, and PSNR while requiring less computational time.

3.1.2.2 Filter-based approaches

Each filter has its strengths and weaknesses and may fit to certain types of medical images as well as noise approaches. The choice of filter mainly depends on some specific application requirements and the trade-off between noise reduction and preservation of image details. Let \(f(x,y)\) be the noisy image, where \((x,y)\) are the spatial coordinates of the image pixels. The goal of image noise reduction is to get an estimate of the original image \(g(x,y)\) by filtering \(f(x,y)\). A filter can be represented by a 2D function \(h(x,y)\) that defines the weights of the filter coefficients. The filtered image \(g(x,y)\) can be obtained by convolving the noisy image \(f(x,y)\) with the filter \(h(x,y)\):

$$g(x,y) = h(x,y) * f(x,y)$$
(2)

Median Filter: Many recent researchers adapted median filters for developing approaches to remove various noises from medical images while preserving image edges and details to improve the denoising impact, (Chanu and Singh 2018) proposed a two-stage Quaternion Vector Median Filter (QVMF) and (Sagar et al. 2020) introduced adaptive median filter (CAMF). For improving the denoising impact (B. quan Chen et al. 2019) integrated MMF with DWT and (Guo et al. 2022) proposed an optimized weighted median filter based on a multilevel threshold. To enhance the protection of clinical images, (Balasamy and Shamia 2021) utilized Switched Mode Fuzzy Median Filter (SMFMF) approach. This approach involves extracting noise-affected pixels and replacing them with the value of the median pixel, effectively reducing noise in the image. To select the appropriate region for watermarking, clinical image features such as color, shape, and texture are extracted. This increases the efficiency of the watermarking process and reduces computational time. Furthermore, the proposed approach does not compromise image quality while also ensuring the security of the watermark data.

Mean Filter: Some methods such as (M. Gupta et al. 2018a, b); (Bonny et al. 2019); (Anam et al. 2020); (Arabi and Zaidi 2021) utilized a mean filter for medical image denoising. To reduce various types of noises such as Salt and Pepper, Speckle, Gaussian, and Poisson from US medical images and restore the image quality, (Gupta et al. 2018a, b) developed various filters: Mean, Gaussian, bilateral, Order statistic, and Laplacian. The experimental evaluation shows that the mean filter has better results for the removal of Salt and Pepper, Speckle, and Poisson noises compared to other filters, and the Gaussian filter achieved better results for the removal of Gaussian noise. (Anam et al. 2020) presented a Selective Mean Filter (SMF) approach for de-noising CT medical images while preserving spatial resolution. This approach involves calculating the mean pixel value by selectively applying a threshold value based on the noise of the image using neighboring pixels in a kernel. Compared to the adaptive mean filter (AMF) and bilateral filter (BF), the SMF approach significantly reduces noise in a more effective manner.

Non-Local Means (NLM) utilized by (Bonny et al. 2019) and (Arabi and Zaidi 2021) for medical image denoising. (Bonny et al. 2019) proposed a new approach utilizing a modified NLM-based filter and the Bhattacharyya Distance (BD) to reduce speckle noise in US images. The BD calculates the similarity between patches in the speckled US image. Compared to other popular approaches, this proposed method is more efficient at reducing noise while also preserving edge details and structural information to a higher degree. (Arabi and Zaidi 2021) introduced a Multiple Reconstruction NLM (MR-NLM) filter for denoising positron emission tomography (PET) medical images which exploit the similarities in image patches. The NLM filter is applied to multiple PET reconstructions of the same scene, which helps in improving the quality of the final denoised image.

Wiener Filter: Some methods such as (Baselice et al. 2018); (Zhao et al. 2019); (dos Santos et al. 2020); (Habeeb 2021); (Naimi 2022). In (Zhao et al. 2019); (dos Santos et al. 2020); (Naimi 2022) utilized a wiener filter to enhance the denoising performance and improve the medical image quality. Baselice et al. (2018) introduced an Enhanced Wiener (EW) filter for speckle noise removal in US images. Markov Random Field (MRF) was utilized for modeling the noise-free image and controlling the regularization intensity. (Zhao et al. 2019) integrated the wiener filtering with the BM3D filtering in the transform domain to a post-thresholding signal to improve the denoising performance on ultra-low-dose CT images. To eliminate noise in fundus images, (dos Santos et al. 2020) proposed a wiener filter for noise reduction to improve the image quality of blood vessels which enhanced the segmentation and classification performance. The suggested approach achieved better diagnosis results. To address the blur in the fused image, (Habeeb 2021) combined a Sharpening Wiener Filter (SWF) and DWT. The experiment yielded positive results, with the proposed fusion demonstrating superior focus operator values compared to other image fusion techniques based on the wavelet domain. (Naimi 2022) presented an integrated approach that utilizes hybrid DWT thresholding (bayesShrink) and Wiener filter to improve image quality. The proposed approach reported a more optimal trade-off between smoothness and accuracy compared to the standard DWT and exhibits less redundancy than SWT.

Bilateral Filter: Several methods such as (He et al. 2017); (Rodrigues et al. 2019); (Elhoseny and Shankar 2019); (Anoop and Bipin 2019) utilized bilateral filters for medical image denoising and enhancing the image quality. (He et al. 2017) Introduced a novel retinal image denoising technique that preserves details of retinal vessels while efficiently removing image noise and improving small areas and weak contrast compared to backgrounds, which is caused by limited image quality and less blood flow in the vessel. It outperforms other techniques in detecting and retaining thin vascular structures by integrating the advantages of both bilateral filter (BLF) and matched filter (MF). Rodrigues et al. (2019) provided a new approach for speckle noise de-noising based on S-median thresholding and the fast bilateral filter. When compared to existing thresholding techniques, this method yielded the best results. The quality of an image improved by an increase of 14.13% in PSNR, 4.96% in MSSIM, and 0.70% in β.

To remove noise and retain information features like edges and surfaces, (Elhoseny and Shankar 2019) Optimized a new de-noising method using a Bilateral Filter and a Convolutional Neural Network. In comparison to existing filters and some classifiers, the Optimized method results indicate a PSNR of 47.52dB and an error rate of 1.23. (Anoop and Bipin 2019) Addressed an Enhanced grasshopper optimization algorithm (EGOA)-based method for selecting bilateral filter parameters in medical MR images to eliminate impulse and Rician noise. In all quality measurements, the bilateral filter with the proposed optimal parameter produced better exposition for all images with different noises.

Gabor Filter: In (Licciardo et al. 2018) and (Z. Chen, Zhou, et al. 2021a, b, c, d) developed an approaches based on the Gabor filter for medical image denoising. (Licciardo et al. 2018)Designed Gabor Filter to improve a portion of multiple filtering operations for their enhancements, noise removal, and mitigation on Medical Imaging. This study shows three distinct GF architectures, each with different accuracy, area, power, and timing tradeoffs. (Chen et al. 2021a, b, c, d) extended Difference of Gaussian (DoG) filter and nonlocal low-rank regularisation to eliminate Rician noise from magnetic resonance imaging (MRI) and preserve edge features. The low-rank regularisation model is built using a combination of nonlocal self-similarity evaluation (NSS) and the extended DoG filter.

3.1.3 Image enhancement

The mainstream of the enhancement stage is to use its methods to produce optimized images of a human onlooker, other developers are using enhancement methods as a preprocessing step to provide enhanced images which is reveal certain features better, compared to their original appearance for further processing levels and analyses (Rundo et al. 2019). The problem formulation of medical image enhancement can be represented as:

$$y(i,j) = T(x(i,j))$$
(3)

where \(x\) is the original image, \(y\) is the enhanced image, \((i,j)\) are the pixel coordinates, and \(T\) is the transformation function. The transformation function \(T\) maps the intensity values of the original image \(x\) to new intensity values in the enhanced image \(y\) and might have several forms based on the technique used.

3.1.3.1 Contrast stretching (CS)

Contrast stretching is a commonly utilized method for medical image enhancement, where it aims at enhancing the perceptibility of delicate details within an image. The method achieves its objective by selectively amplifying or attenuating the contrast between distinct intensity levels. We reviewed the articles, (Yang et al. 2018); (Ruikar et al. 2019); (Panse and Gupta 2021); (Malik et al. 2022). In this work, (Yang et al. 2018) introduced a high-capacity reversible data hiding (RDH) scheme with contrast stretching, to enhance the contrast of medical images while also allowing for secure data storage and transmission, particularly in the regions of interest (ROI), without compromising the image quality. The introduced approach may not fit for all kind of medical images, particularly those with low contrast or complex textures.

For improving the quality of CT images by reducing the effects of artifacts caused by various factors such as beam hardening, metal objects, and motion, (Ruikar et al. 2019) suggested a contrast stretching approach based on histogram modeling and intensity transformation methods to increase the dynamic range of pixel values and smooth out the image. the proposed approach shows a loss of detail. (Panse and Gupta 2021) utilized a hybrid approach consisting of contrast stretching (CS) and Brightness Preserving Dynamic Histogram Equalization (BPDHE) to enhance the low contrast ratio of medical images. To improve the contrast of lesions for segmenting and classifying a skin lesion, (Malik et al. 2022) presented a Differential Evolution-Bat Algorithm (DE-BA) to calculate parameters utilized in brightness-retaining CS transformation. The presented approach was evaluated on various datasets, including ISIC 2016, 2017, 2018, and PH2, and it significantly improved segmentation outcomes compared to some other segmentation models.

3.1.3.2 Histogram equalization

Histogram equalization is a common method for improving the clarity and contrast of grayscales of medical imaging such as X-rays, CT, MRI, and ultrasound. (Agarwal and Mahajan 2018) utilized Range Limited and Weighted Histogram Equalization (RLWHE) along with adaptive gamma correction followed by homomorphic filtering for enhancing a low contrast of medical images and preserving their details. The RLWHE showed better contrast enhancement, high preservation, and controlled over-enhancement than other methods. To improve the visibility of diseased details in medical imaging, (Kuo and Wu 2019) introduced Gaussian probability bi-histogram equalization median plateau limit (GPBHEPL-D) approach. (Subramani and Veluchamy 2020) developed an adaptive fuzzy grey level difference histogram equalization approach to minimize noise and improve interpretation accuracy. The proposed approach provides a distinct road map for thoroughly analyzing the minute features and contaminated parts.

To improve the medical images quality by enhancing contrast, (Sonali et al. 2019); (dos Santos et al. 2020); (Lucknavalai and Schulze 2020); (Dinh and Giang 2022) utilized a Contrast Limited Adaptive Histogram Equalization (CLAHE) that improved the further image processing like segmentation, feature extraction, classification, recognition, and fusion. Sonali et al. (2019) also employed a combination of filters for de-noising of the color fundus image. To improve the data density and specificity of medical images, (Pashaei and Pashaei 2023) presented a Gaussian Quantum Arithmetic Optimization Algorithm (QAOA) with CLAHE. CLAHE's optimum clip limit and other parameters are calculated by GQAOA utilizing a novel multi-objective fitness function.

To enhance the clarity, contrast, and overall quality of the medical images as well as to provide computational assistance, (Acharya and Kumar 2021); (Fan et al. 2023) proposed an adaptive histogram equalization (AHE) approaches. (Acharya and Kumar 2021) used an AHE method based on a genetic algorithm (GAAHE) that included adapting the probability density function (PDF). The histogram is subdivided innovatively considering the exposure threshold and the optimum threshold for maintaining brightness while minimizing data loss. The threshold parameters are optimized using a genetic algorithm, with the proposed multi-objective fitness function as a guide. Every sub-histogram's PDF is then adjusted to improve the picture. Fan et al. (2023) used a novel transform function to enhance the contrast or brightness of medical photographs, and the improved sparrow search (ISS) method was utilized for optimizing two parameters in the transform function. Following that, the preceding step's output image is equalized using the AHE technique with contrast limited to standardize the pixel distribution.

3.1.3.3 Edge enhancement

Edge enhancement is a common technique in image processing used to highlight and emphasize the edges or boundaries of objects and features within an image. To achieve enhanced edges and denoised X-ray fluoroscopy images while maintaining features, (Luo et al. 2022) developed an edge-enhancement densenet (EEDN) approach which boosts edge sharpness. A CNN-based denoiser was first employed to produce an initial denoising output. Then, a group of interacting ultra-dense blocks is used to represent edge features, and the attention block is used to extract contours reflecting the edge information. The combination of the initial denoising result and the improved edges generates the final X-ray image.

3.2 Medical image segmentation

The simplification and depiction of medical images into something more useful and easier to evaluate is known as image segmentation. Image segmentation is a technique for identifying objects and boundaries (lines, curves, etc.) in images. Precisely, assigning a label to each pixel in an image in such a way that pixels having same label must have similar features. A group of segments which collectively encompass the full image, or a set of features retrieved from an image, is the result of image segmentation. Figure 4 shows the process flow of medical image segmentation. Table 3 illustrates the Comparative analysis of medical image segmentation approaches.

Fig. 4
figure 4

The process flow of medical image segmentation

Table 3 Comparative analysis of medical image segmentation approaches

A cost function or objective function may be used to express the general problem formulation of Medical Image Segmentation (MIS). Segmentation is optimized in a way that guarantees to meets a specific criterion and the cost function strikes a balance between measures of similarity within and across areas. In general, the cost function may be expressed as:

$$C(S) = D(S) + \lambda R(S)$$
(4)

where \(S\) is the segmentation, \(D(S)\) is the data term that evaluates the degree of image-to-segmentation similarity and might have several forms based on the segmentation technique used, \(R(S)\) is the regularization term that promotes segmentation smoothness, and \(\lambda\) is a weighting parameter that strikes a balance between the data term and the regularization term. This section reviews currently existing MIS approaches based on five categories: Thresholding based, Region based, Edge based, Clustering based, DL based (supervised and unsupervised). Figure 5 presents a taxonomy on MIS approaches based on utilized method.

Fig. 5
figure 5

Taxonomy of segmentation approaches

3.2.1 Thresholding-based approaches

Thresholding is a significant segmentation technique which was proven to be effective in different applications. Image thresholding is utilized for finding suitable threshold values for which the image histogram. Thresholding is of two types: global thresholding and local thresholding. Global thresholding approaches apply only a single threshold value on the given image, while local thresholding approaches apply multiple threshold values to various regions of an image.

Global Thresholding: Global thresholding is a basic yet efficient image segmentation approach that separates an image into two different regions: the foreground (object of interest) and the background. Otsu (1996) proposed the most commonly global thresholding techniques for image segmentation. For all conceivable intensity values in the image, Otsu's presented an iterative procedure in which the intensity levels are split into two clusters (background and foreground). The purpose is to identify the threshold value where the sum of foreground and background spreads is at its minimum. For each threshold that is selected, they compute a measure of spread for the pixel levels intensity in each cluster. Sharma et al. (2019); (Mandyartha et al. 2020); (Kalyani et al. 2021) utilized Otsu's method to develop their approaches for MIS.

(Sharma et al. 2019) integrated a Differential Evolution (DF) algorithm and OTSU method for finding an optimal threshold value for brain tumor segmentation. The integrated approach utilizes GLCM to identify the features, and then segment the MRI images. It also utilizes Network Fitting Tool (nf-tool) to train the neural network for future use such as classification. The results of this approach showed increased efficiency compared to other methods. Where, (Mandyartha et al. 2020) integrated global and local thresholding approaches to compute multiple threshold values for white blood cell image segmentation. The proposed approach demonstrated robustness and efficiency in its segmentation outcomes. While, (Kalyani et al. 2021) introduced a multilevel thresholding (MLT) approach that employs the Kapur and Otsu objective functions for medical image segmentation. However, MLT's execution time increases with the number of threshold levels required to locate the optimal threshold. To overcome this challenge, the robust teaching learning-based optimization (TLBO) algorithm was employed. The experimental findings reveal that Otsu-based TLBO outperforms Kapur-based TLBO with exceptional results.

For segmenting MR brain images (N. Gupta et al. 2018a, b); (Khorram and Yazdi 2019) presented an adaptable thresholding methods. (Gupta et al. 2018a, b) utilized RLCP (Run Length of Centralized Patterns) to extract the binary patterns and run length matrix features to improve the performance of glioma segmentation. The extracted features work well with the Naive Bayes classifier to detect and classify tumors. While, (Khorram and Yazdi 2019) used an ant colony algorithm for MR brain image segmentation with multiple levels of complexity.

Local Thresholding: Local thresholding, also known as adaptive thresholding, is an image segmentation approach that addresses some of global thresholding's drawbacks. In contrast to global thresholding, which assigns a single threshold value to the whole image, local thresholding gives different threshold values to different regions or neighborhoods within the image. This method is especially beneficial when the image has variable lighting levels or considerable local differences in object and background intensity. Using the cuckoo optimization algorithm, (Bouaziz et al. 2015) presented a multilayer image thresholding technique (MECOAT). The cuckoo bird acts as the inspiration for COA, to identify the threshold that will reduce the entropy in the segmented pixel in clusters. (Senthilkumaran and Vaithegi 2016) introduced two local thresholding techniques Niblack and Sauvola that uses the local mean and standard deviation to filter out background noise. The Niblack algorithm produces results that are superior to those of the Sauvola method.

3.2.2 Region-based approaches

The Region-based segmentation approaches in the medical images identifies and segments areas of interest that have commonalities in intensity values or other image characteristics. Generally, region-based segmentation approaches include region growing, region merging and splitting, and watershed.

3.2.2.1 Region growing

Region growing is a fundamental image segmentation technique that divides an image into regions or segments according to similarity criteria. The concept of region growing is, to begin with, one or more seed points (pixels or small regions) and adding neighboring pixels to the growing region repeatedly if they fulfill certain similarities. The process is repeated until there are no more pixels available for addition to the region.

In order to improve the accuracy rate of MIS (Raja et al. 2018); (Reddy and Chenna Reddy 2018); (Swaroopa et al. 2022); (Feng et al. 2023), they proposed a region-growing method to develop their approaches. Raja et al. (2018) integrated Tsallis entropy and the seed region growing method to identify the ROI from medical MRI images by using the T1C modality. The Chaotic Bat Algorithm (BA) supervises and controls the entire functionality of the proposed method, and the Haralick texture features are identified to improve the segmentation accuracy. Where (Swaroopa et al. 2022) utilized GLCM for extracting the texture features from cell images, and the image was segmented using the region-growing method. Even for the darker cell images, the proposed approach was computationally fast and efficient. While (Feng et al. 2023) developed an interval iterative multi-thresholding segmentation approach based on a hybrid spatial filter with region growing for segmenting medical brain MRI images.

To segment breast tumor ROI from US images and reduce the false positives rate, (Zeebaree et al. 2019) proposed an approach using local pixel information and a neural network followed by region growing to get accurate segmentation. Then, Different features were extracted from each pixel using local binary patterns (LBP). Finally, Feature fusion was used to combine all features to feed it into an artificial neural network (ANN) classifier. An optimized region-growing approach was introduced in (Anshad et al. 2019) and (Cheng and Wang 2020) articles for medical image segmentation. (Anshad et al. 2019) applied the modified region-growing technique for segmenting of chondroblastoma from medical images. While (Z. Cheng & Wang 2020) utilized the optimized region-growing approach to avert the image segmentation errors that occur in the P1 P3 interfaces between the two phases with the highest and lowest grayscale intensity levels. Cui et al. (2019a, b) integrated a region-growing-with-co-constraint-based automated segmentation approach for imaging brain tumors. The experimental findings demonstrate that the suggested technique provides superior segmentation accuracy and computing efficiency compared to the conventional region-expanding approach. While (Biratu et al. 2021) suggested an enhanced region-growing approach for the automatic initialization of seed points to detect abnormal regions in brain images. Where (Khan et al. 2023) proposed a light CNN approach that utilizes watershed-based region-growing segmentation for detecting COVID in both CT scans and X-rays.

3.2.2.2 Region merging and splitting

Region merging and splitting are image segmentation post-processing methods utilized to improve the results of segmentation techniques by adapting the outer boundaries of segmented regions, either by merging similar nearby regions to produce larger, more coherent regions or by splitting regions into smaller sub-regions for more homogenous. Nija et al. (2020) presented an approach for segmenting the Optic Disc (OD) in cases of retinal illness based on statistical region merging (SRM) and the morphological operations technique. The SRM approach in OD segmentation was shown to be superior to the existing techniques. Salih and Viriri (2020) provided a complement to the advantages of the pixel-based Markov random field (MRF) and stochastic region merging approaches to overcome the limitations of the MRF approach, which are caused by a number of obstacles that impact the skin lesion segmentation results, including irregular as well as fuzzy borders, the presence of noisy and artifacts, and a lack of contrast between lesions. Liu et al. (2020) introduced an approach that integrated a deep U-Net with a superpixel region merging to optimize medical image segmentation. Bilateral filtering was utilized to reduce background noise and improve soft tissue contrast in the image boundaries, and a normalization layer process was applied to avoid overfitting and boost sensitivity. (Qiao et al. 2021) developed a Maximal Similarity-Based Region Merging (NMSRM) approach for segmenting pancreas images. This approach utilized the superpixels technique as a preparation step to improve the edges of the pancreas CT images.

3.2.2.3 Watershed

The watershed is a sophisticated image segmentation technique that is frequently used to split an image into discrete regions or segments depending on the topography of the image. It is especially helpful in medical imaging when objects or regions of the image are divided by intensity or gradient ridges. In order to fully automate the segmentation and detection of breast tumor lesions in ultrasound images, (Bafna et al. 2018) proposed a marker-based watershed segmentation approach. But the presented technique fails to detect the tumor regions in images with shadow areas. To effectively identify the cancer lesion in computed tomography (CT) images of the liver, (Das et al. 2019) developed a new method known as the watershed Gaussian-based DL (WGDL). The proposed model depends on the marker-controlled watershed transformation and the Gaussian mixture. The key benefit of this automatic identification method is that it used a deep neural network classifier to get the best accuracy value.

In order to obtain an accurate segmentation of regions (Sivakumar and Janakiraman 2020) and (Vikhe et al. 2022) utilized an improved watershed approaches. (Sivakumar and Janakiraman 2020) developed an ECED and MWS model (enhanced canny edge detection and modified watershed segmentation) to segment the Region of Interest (ROI) from the MRI images. In this model, MWS was utilized for segmentation and ECED was utilized for edge detection. The presented model successfully segmented brain tumors with high accuracy rate. Vikhe et al. (2022) proposed an improved marker-controlled watershed approach to detect suspicious regions accurately in mammograms. The method uses morphological operations and threshold techniques to suppress artifacts and the pectoral region and computes the magnitude gradient to get mass edges. Internal and external markers are then determined, and the watershed transform is utilized to a modified gradient image to isolate suspicious regions. This approach improved mass detection results.

To tackle (under and over) segmentation issues (Kucharski and Fabijańska 2021) and (Napte and Mahajan 2023) developed an approaches based on watershed method. (Kucharski and Fabijańska 2021) introduced an integrated approach to address the over-segmentation issue in corneal endothelium cell images. The approach combines a marker-driven segmentation using the watershed algorithm to segment corneal endothelial cells and an encoder-decoder CNN trained in a sliding window for predicting the probability of cell centers (markers) and also cell borders. These predicted markers are then utilized for watershed segmentation of edge probability maps outputted by a neural network. The watershed-CNN showed improvement in terms of cell size, DICE coefficient, and Modified Hausdorff distance (MHD). Napte and Mahajan (2023) introduced a marker-based watershed transforms approach. A modified double-stage Gaussian filter (MDSGF) is utilized to improve the contrast and maintain the edge and texture details. The introduced approach reduced under and over segmentation of liver object.

3.2.3 Edge-based approaches

Edge-based approaches are commonly utilized in MIS due to their ability to identify the boundaries that separate distinct regions or structures within an image. (Zheng et al. 2018) introduced an integrated approach for medical image segmentation that combines SVM and graph cuts algorithm. A localized training scheme is utilized to train a classifier for each pixel based on the target image information. After that, the graph cuts method is applied to combine edge, local, and remote-local information from the SVM result to improve post-processing. The proposed approach has been shown to enhance both segmentation accuracy and classification results. For segmenting medical images with intricate borders, (Ullah et al. 2019) presented a curve-evolution technique based on an edge-following approach. Liu et al. (2019a, b, c) developed a weighted edge-based approach for segmenting medical, synthetic, and real nature images with varying levels and types of noise. The approach is based on multi-local statistical information and incorporates length and regional coefficients, as well as a modified edge stop function. The experimental results showed that this approach achieved higher segmentation accuracies, indicating its effectiveness and robustness.

To perform MIS, (Hashemi et al. 2019); (Fang et al. 2020) developed an active contour approach based on edges. (Hashemi et al. 2019) incorporating both local and global image details, the introduced approach employs a novel edge-based energy function. The authors demonstrate that the introduced approach performs well for segmenting various types of medical images, such as brain MRI and retinal OCT images. While (Fang et al. 2020) used the edge-based vector-valued active contour approach to simultaneously segment abnormal tissue regions in multi-modal medical images. Region-based information and combining hybrid mean intensities were utilized for each modality's signal characteristics. Additionally, a two-dimensional vector field is utilized with various image modalities for constraining the results of the segmentation process based on edge-based information. This approach was evaluated on brain MRI-CT and lung PET-CT images and showed potential for routine clinical use in the near future.

For segmenting blood vessels in retinal images, (Ilayarajaa and Logashanmugam 2020) utilized a morphological and Canny edge detection approach on DRIVE dataset color fundus images. To tackle the issue of weak edges in MIS, (Cao et al. 2021) presented an approach called Edge and Neighborhood Guidance Network (ENGNet). The ENG module is destined to avail edge information and fine-grained neighborhood spatial information simultaneously. A multi-scale adaptive selection (MAS) module is included for extracting multi-scale context information and adaptively fuse various scale features to improve the classification of edge and neighborhood pixels. Experimental results reported that the ENGNet approach effectively reduces misclassification in weak edge regions and outperformed other existing methods. (Wang et al. 2022a, b) introduced a boundary-aware context neural network (BA-Net) to gather more comprehensive context and preserve fine spatial details for 2D MIS. The approach includes of three modules: (1) a pyramid edge extraction module for extracting edge information at multiple granularities. (2) a mini multi-task learning module for segmenting object masks and detecting lesion boundaries simultaneously. (3) The cross-feature fusion module for aggregating multilevel features from the hole encoder network. As a result, cascading the three modules, the BA-Net captures richer context and fine-grained features at every stage of segmentation process.

3.2.4 Clustering-based approaches

Clustering refers to the procedure of grouping pixels or voxels within an image into distinct clusters, primarily determined by similarities in terms of intensity values or other pertinent characteristics. The clusters can subsequently be employed to demarcate the boundaries of distinct regions or structures within the image. clustering-based segmentation approaches include K-means clustering, fuzzy clustering, and hierarchical clustering.

3.2.4.1 K-means clustering

K-means clustering is a well-known unsupervised ML approach utilized in image segmentation for grouping similar pixels or regions. It is especially effective for dividing an image into multiple segments according to pixel similarities, each segment indicating a distinct object or region in the image. K-means clustering divides an image to K clusters repeatedly, where K is a user-defined factor representing the desired number of segments. In order to segment brain lesions in multiple imaging modes, (Agrawal et al. 2018) proposed a hybrid method that included K-means clustering and morphological operations. The median filter was utilized to extract impulsive noise from brain images, then K-means clustering was used for segmentation, Sobel boundary detection, and morphological processing are then applied for brain tumor extraction. The presented method scored a high accuracy value. Furthermore, high significance was obtained from the automatic approach's statistical significance test for segmented lesions.

K-means clustering utilized for segmenting various medical regions images such as the (Capor Hrosik et al. 2019); (Braiki et al. 2020); (Hannah Inbarani et al. 2020); (Kumar et al. 2021); (Nawaz et al. 2022); and (Faragallah et al. 2023) articles. (Capor Hrosik et al. 2019) proposed a k-means clustering approach combined with the firefly algorithm for segmenting brain images and emphasizing various primary brain tumors such as glioma, metastatic adenocarcinoma, metastatic bronchogenic carcinoma, and sarcoma from PET, MRI, and SPECT images. Otsu’s criterion is utilized as the fitness function to enhance the segmentation performance. In (Braiki et al. 2020) developed a method for automatically segmenting microscopic images of dendritic cells using Chan-Vese Active Contour and K-Means Clustering. Dendritic cells are initially detected using K-means clustering and morphological operations. Then, for more precisely segment the detected cells, a region-based Chan-Vese active contour technique is applied. Finally, dendritic cells are separated using eccentricity-based filtration. In comparison to recent models, it has the highest average accuracy rate (99.42%).

For segmenting the nucleus in images of leukemia, (Hannah Inbarani et al. 2020) presented a hybrid histogram-based soft covering rough k-means clustering (HSCRKM) approach. From the nuclear segmentation image, many features were extracted, including GLCM, color, and shape-based features. Then, the cells were classified into malignant and benign classes using ML classification methods. In (Kumar et al. 2021) suggested a model to detect and segmenting tumors based on rough K-means clustering method. Initially, MRI images are preprocessed to be prepared for segmentation. Then, Improved Gabor wavelet transform (IGWT) is used to convert the images into a transform domain. Then, several features are captured from every image in the database and the oppositional fruit fly (OFFA) chooses significant features. Then, support vector machine is utilized for classifying an image as normal or abnormal. Finally, the region of interest is segmented from abnormal images based on the rough K-means clustering approach.

The faster region-based CNN (RCNN) and fuzzy k-means clustering (FKM) was integrated by (Nawaz et al. 2022) for segmenting skin melanoma. This method was evaluated on a variety of clinical images. Initially, before producing the feature vector with a fixed length by the faster RCNN, the image preprocessing is performed for eliminating noise and lighting issues and improving the visual information. Then, the melanoma-affected area of skin is then segmented with different sizes and edges using FKM. In (Faragallah et al. 2023) developed a K-means clustering and Otsu's thresholding approach on different transform domains to segment and localize brain tumor regions at a reasonable time. Adaptive histogram equalization and local contrast enhancement have improved the overall performance of this approach resulting in precise brain tumor localization.

3.2.4.2 Fuzzy clustering

Fuzzy clustering, specifically Fuzzy C-means (FCM), is an image segmentation variant of the standard K-means clustering method. Unlike K-means, which allocates each pixel to a single cluster, while in FCM each pixel can have various degrees of membership in several clusters. Because of this flexibility, FCM is a strong method for addressing the uncertainty and ambiguity issues that are frequently present in image segmentation tasks. To this end, (Aruna Kumar and Harish 2018) and (Santos et al. 2018) developed an approach based on Fuzzy C-means method to enhance the segmentation accuracy of MIA and (Aruna Kumar and Harish 2018) introduced an Intuitionistic Fuzzy C-means (IFCM) approach for MIS, applying the principles of intuitionistic fuzzy set theory to calculate certain parameters. This approach utilizes Hausdorff distance to measure the distance between the cluster center and pixel. The evaluation reported that incorporating various cluster validity functions, the IFCM approach outperforms other existing methods. To segment ROI in medical images, (Santos et al. 2018) utilized a Seeded Fuzzy C-means clustering approach. The proposed approach demonstrated an Excellent Kappa index in the majority of the tests after being examined on a total of 2200 images collected from various datasets for leukemia, skin cancer, cervical cancer, and glaucoma.

Several Fuzzy clustering based methods for brain image segmentation were proposed such as (Huang et al. 2019); (Jiang et al. 2019a, b); (Jiang et al. 2020); (Yang et al. 2020); (Pitchai et al. 2021); (Cai et al. 2021); and (Hooda and Verma 2022). Moreover, (Huang et al. 2019) proposed a new approach for image segmentation that used the FCM clustering algorithm and rough set theory. The image is split to many small regions utilizing the indiscernible relation of attributes. Then, the attribute value table is built using the segmentation results of FCM under various clustering numbers. The weight values for each attribute are then generated through value reduction and utilized as the foundation for calculating the difference between regions, after which each region's similarity is assessed using an equivalence relation determined by the difference degree. Finally, regions are integrated and segmentation is achieved using the final equivalence relation determined by similarity. The efficacy of traditional methods in MIS still requires enhancements because of the low contrast in grayscale images, the uncertainty and complexity of MRI images, and individual variability. For segmenting MR brain images, (Jiang et al. 2019a, b) presented a distributed multitask fuzzy c-means (MT-FCM) clustering approach for extracting information shared by several clustering tasks.

A negative-transfer-resistant fuzzy clustering approach with a shared cross-domain transfer latent space (LSS-FTC-NTR) approach proposed by (Jiang et al. 2020) for brain CT image segmentation. This approach incorporates a negative-transfer-resistant mechanism that can identify and resist negative source transfer knowledge. The representation of image data was unified from various domains by applying maximum mean discrepancy (MMD) in the shared latent space, facilitating knowledge transfer across domains. The proposed approach enhanced the segmentation performance by leveraging source knowledge and addressing noise. In (Yang et al. 2020) integrated fuzzy clustering with the level set approach via a dynamic limited term in a novel energy function to boost the performance of MIS. The outputs of fuzzy clustering are used directly, allowing the level set evolution to be controlled. In (Pitchai et al. 2021) has developed a hybrid method of fuzzy K-means and Artificial Neural networks (ANN) for brain tumor segmentation. The wiener filter is first used to preprocess the MRI images in order to remove noise. Using the Crow Search Optimization Algorithm (CSOA), the significant attributes are selected from the extracted features. The classification of normal and abnormal images is done by using ANN. Finally, the tumor region has been segmented by using the fuzzy Kmeans method. Medical images are notoriously difficult to segment due to their great complexity and inherent noise. To solve these problems, (Cai et al. 2021) introduced an approach of composite network model architecture based on quadratic polynomial-guided fuzzy C-means and a dual attention mechanism (QPFC-DA). In (Hooda and Verma 2022) proposed a clustering method based on Fuzzy-Gravitational Search Algorithm (GSA) for segmenting MRI brain images. This method includes a provision for adjusting the value of the parameter α used in calculating the gravitational constant. Fuzzy inference methods are utilized to control the parameter as the search progresses. A smaller value of α is preferred to get a higher exploration and a relatively higher value of α assists in achieving higher exploitation closer to the end of the search.

3.2.4.3 Hierarchical clustering

Hierarchical clustering is a method for image segmentation that organizes pixels or regions into a tree-like hierarchy. It is built on the idea of repeatedly merging or splitting clusters of pixels, resulting in hierarchical segments with varying levels of detail. For the purpose of segmenting hyperspectral colon biopsy images, (Kumar et al. 2018) developed an approach that combined hierarchical clustering (HC) with non-negative matrix factorization (NMF). When the tissue components (cell types) have accurately been segmented at the pixel level, the disease-state classification at the image or patient level becomes relatively simple as shown by the evaluation of the suggested approach. For the early detection of brain cancer, (Tamilmani and Sivakumari 2019) utilized Association Allotment Hierarchical Clustering (AAHC). Initially, noise in microscopic images may be filtered out using a mutual piece-wise linear transformation method. Then, AAHC analyzes the cell, tissues, and critical boundaries to segment the cancerous cells. After that, gray wolf optimization extracted different textures and statistical data from the segmented region to enhance classification accuracy. Finally, neural networks classify selected features. The proposed approach improves efficiency and gets very close to perfection in its results. For segmenting the abnormal regions presented in medical images (Vupputuri et al. 2020); (Leo et al. 2021); (Deepa et al. 2022) introduced a symmetry superpixel-based hierarchical clustering (SSHC), Multi-Feature Hierarchical Clustering (MEHC), and HC with CNN approaches respectively.

3.2.5 Deep learning-based approaches

The remarkable evolution of AI techniques in various fields, specifically, DL based techniques has made a tremendous impact on MIS. This helped the research community in introducing highly effective methods for the automated and precise segmentation of medical images. DL approaches are constructed using Artificial Neural Networks (ANN), which possess the ability to acquire the capability of executing intricate tasks through the analysis of substantial volumes of data. Numerous medical imaging applications have demonstrated the superior performance of these models compared to traditional segmentation approaches.

To this end, various DL based approaches were applied for MIS such as: Convolutional Neural Networks (CNNs), Fully Convolutional Networks (FCNs), Generative Adversarial Networks (GAN), U-Net, SegNet, Region Convolutional Neural Networks (R-CNN), and DeepLab. The aforementioned architectures have been specifically devised to acquire hierarchical representations of the input image, subsequently utilized for image segmentation.

3.2.5.1 Supervised approaches

Supervised DL is a sophisticated approach to image segmentation that utilizes Deep Neural Networks (DNN) for learning and extracting features from images to conduct pixel-wise classification. In this method, labeled data is used to train DL based approaches, with each pixel in the training images associated with a ground truth label identifying its class or segment. This kind of DL-based image segmentation is known as semantic segmentation.

Convolutional Neural Network (CNN): Generally, CNNs based methods are the most popular used and successful architectures among DL-based methods, particularly for computer vision problems. CNN was initially developed by (Fukushima et al. 1983) in his basic research article on the "Neocognitron", which was based on the hierarchical receptive field model of the visual cortex developed by Hubel and Wiesel. Later, (Waibel et al. 1989) proposed CNNs with weights shared between temporal receptive fields and backpropagation training for phoneme recognition, while (Lecun et al. 1998) built a CNN architecture for document recognition. Figure 6. depicts the architecture of CNN.

Fig. 6
figure 6

Convolutional neural network architecture

Three different types of layers fundamentally make up CNNs: convolutional layers, which convolve a weighted kernel for the purpose of extracting the features; nonlinear layers that employ an activation function to feature maps (typically elementwise) for modeling non-linear tasks by the network; pooling layers, which substitute a small neighborhood of a feature map with some statistical data and decrease spatial resolution. (Milletari et al. 2016) presented an approach for the segmentation of 3D MR images of the prostate that utilizes a volumetric convolutional neural network (V-CNN). The CNN trained using MRI volumes that depict the prostate and subsequently acquire the ability for predicting segmentation to the entire volume in a single instance. Authors proposed a new objective function that utilized the Dice coefficient to address scenarios characterized by a significant disparity in the quantity of foreground and background voxels. Xu et al. (2017) used a CNN model trained on the ImageNet dataset to extract characteristics from histopathology images. SVM are used to establish segmentation as a classification task. The aforementioned approach was used in the analysis of the digital histology and colon cancer dataset from the Medical Image Computing and Computer Assisted Intervention Society (MICCAI) 2014 Challenge. Ultimately, this technique achieved first place in the challenge, with a test data accuracy of 84%

To segment brain tissue on MRI scans obtained from multiple sclerosis (MS) patients participating in a sizable clinical trial conducted across multiple centers, (Narayana et al. 2018) employs deep CNNs. The patients underwent MRI scans that encompassed multiple channels, specifically T1-weighted, dual-echo fast spin echo, and fluid-attenuated inversion recovery images. The input to the CNN for tissue classification consisted of preprocessed images that underwent co-registration, skull stripping, bias field correction, intensity normalization, and de-noising. In (Ganaye et al. 2018) the authors introduced various methods to incorporate spatial limitations into the network with the aim of minimizing forecast discrepancies. The study employed a CNN architecture for segmenting brain MRI images into cerebral structures and utilizing various scales to acquire contextual information. The incorporation of spatial limitations in the CNN was achieved by either utilizing a distance to landmarks characteristic or by integrating a probability atlas. (G. Wang et al. 2019a, b, c) presented a study that puts forth an approach for approximating the imprecision of medical image segmentation through the utilization of CNNs that incorporate test-time augmentation. The method under consideration was evaluated on two publicly accessible datasets, namely the ISBI 2012 EM segmentation challenge and the PROMISE12 challenge.

The segmentation of the cell nucleus images is performed in the study conducted by (Cui et al. 2019a, b) using DNN. Initially, the image undergoes color normalization, followed by the use of FCN to make predictions about the nucleus and its corresponding edge. Following the completion of post-processing, the ultimate segmented core is generated. The experiment demonstrates the superiority of this approach compared to the previous method used in the field. Furthermore, it establishes the feasibility of properly segmenting the medical images within a reasonable timeframe. Liver tumor segmentation is challenging task due to the complex and diverse nature of CT images, which exhibit size, shape, and location variations. Meng et al. (2020) addressed the aforementioned issues by concentrating on the segmentation of the human liver tumor through the utilization of a CNN. Specifically, a three-dimensional dual-path multiscale (TDP-CNN) was developed. The Conditional Random Fields (CRF) are utilized to enhance the accuracy of segmentation results by eliminating false segmentation points. S et al. (2023) developed an integrated approach of CNN with ResNet50 for Segmenting lung CT images. The developed approach employed the ResNet50 architecture network to distinguish between healthy lungs, lungs affected by COVID-19, and viral pneumonia-affected lungs.

Fully Convolutional Networks (FCN): Typically, FCN is a common DL network for semantic image segmentation. FCN was a significant development computer vision due to the permitted end-to-end pixel-wise imagery segmentation by adjusting existing CNNs to the specific task. For the segmentation of isointense phase brain MR images, (Nie et al. 2018) developed a 3-D multimodal FCN model (CC-3-D-FCN). To better model minuscule tissue regions, we extend the typical FCN designs from 2-D to 3-D. To improve segmentation performance, the developed model combines information from the coarse and dense layers. They used a further convolutional layer to address the issue of bias signals. In addition, they used batch normalization to improve and speed up network convergence. Roth et al. (2018) suggested a multi-class 3D FCN approach for MIS. First-stage FCN reduces the loos function of training based on differentiating the voxels of foreground and background which enables the FCN to roughly segment the regions. Second-stage FCN focuses more on boundary region segmentation.

To improve the performance of FCN in MIS, (Bi et al. 2018) introduced dual-path adversarial learning (DAL). Initially, DAL improved the extraction of ROI features by controlling the learning at various levels of complexity. Then, the features of DAL are added to FCN training data to increase the overall feature diversity. It has been found that the DAL improved the FCN performance without applying any optimizations. While, (Li et al. 2019) improved FCN by adapting the U-Net architecture to enhance segmentation performance. The up-skip connection enhanced network connections and information flow between the encoding and decoding parts. Furthermore, for assisting the network in learning richer representations an inception module is employed in every block and a cascade training technique is used for sequentially segmenting small tumor regions.

Based on 3D deeply supervised FCN (3D DSA-FCN) with concatenated atrous convolution, (B. Wang, Lei, Jeong, et al., 2019) introduced a method to segment MR prostate images. While using simple dense predictions as deep supervision during the training phase, more discriminative features offer explicit convergence acceleration, and the concatenated atrous convolution extracts more global contextual data for reliable forecasts. This evaluation results reported high values in the Dice similarity, HD, and the mean surface distance (MSD). To segment the prostate on MRI, (Wang, Lei, Tian, et al. 2019a, b, c) developed a 3D FCN with deep supervision and group dilated convolution (DSD). The deeply supervised method updates the hidden layer filter to prefer highly discriminative features. However, for dense prediction, a group dilated convolution that integrates multi-scale contextual information was applied to enhance the prediction accuracy of prostate regions in dense areas. Moreover, a combined loss function that includes cosine and cross-entropy was applied to increase segmentation accuracy.

In 3D MIS, common issues are the model size optimization and exploitation of volumetric information. Therefore, to solve such issues (Baldeon Calisto and Lai-Yuen 2020) suggested the AdaEn-Net, a self-adaptive 2D-3D ensemble of FCNs (2D-3D FCN) for 3D medical imaging segmentation that combines volumetric data and adjusts to a specific dataset by optimizing the model's performance and size. A 2D FCN that pulls information from within slices and a 3D FCN that exploits it. A multi-objective evolutionary based technique for optimizing the predicted segmentation accuracy and reducing the number of parameters in the network. It is used to find the structure and hyperparameters of the 2D and 3D architectures. For the segmentation of brain tumors in MRIs, (Sun et al. 2021) presented a 3D FCN method. They specifically use multi-pathway architecture for effective feature extraction from the input images. These multi-pathway feature maps are then merged into transposed convolutional layers. The presented method proved its capability in segmenting the entire tumor region and the efficiency in segmenting 3D brain tumors. Tomar et al. (2022) proposed an approach named feedback attention network (FANet) that integrates the feature map from the current training epoch with the mask from the previous epoch. In subsequent convolutional layers, the learnt feature maps are given more focus using the prior epoch mask. During testing, the network may also fix the predictions frequently.

To segment the cell nucleus, (Zhang et al. 2022a, b) suggested a hybrid approach based on FCN and GAN. The FCN is utilized for the initial segmentation of input images and then GAN is improved adding splitting branches to the discriminator structure and combining the GAN with the splitting network. Finally, to accomplish high-precision segmentation of the nucleus image, the segmented image output from FCN is supplied as an input to the GAN. CABEZA-GIL et al. (2022) presented FCN approach for automatic segmentation of ciliary muscle in OCT images. Specifically, a UNet and EfficientNetb2 backbones were trained for segmentation task. It has been found that the FCN reported high performance in terms of accuracy. Wen et al. (2023) introduced an integrated approach for polyp segmentation in Colonoscopy images that combines atrous FCN with a ResNet50 backbone for region proposal and classification and CNN for region refinement. To prevent overfitting and enhance segmentation performance, specific transfer learning schemes explore the effects of inter-domain and intra-domain weight transfers during the learning stage. These schemes are designed to optimize knowledge transfer between related domains.

UNet: It is an important and significant architecture in computer vision, especially in MIA due to its efficiency in precisely segmenting objects or areas inside images. (Ronneberger et al. 2015) proposed the U-Net method, which consists of two main components: an encoder and a decoder. Up-sampling operators are included in the decoder to restore the resolution of the given input images. Additionally, the encoder's retrieved features are merged with the up-sampled outcomes to get accurate localization. The U-Net method demonstrates excellent segmentation performance across many types of medical images. Zhou et al. (2018) introduced UNet +  + to satisfy the demand for more precise MIS. The UNet +  + architecture is basically a deep supervised encoder-decoder network, with dense, nested skip pathways connecting the encoder and decoder sub networks. The redesigned skip paths seek to minimize the semantic gap between the encoder and decoder subnetworks' feature maps. UNet +  + increased the intersection over union (IoU) by an average of 3.9 and 3.4 points over U-Net and wide U-Net, respectively. Isensee et al. (2018) proposed a self-adapting approach called nnU-Net ("no-new-Net") for MIS based on 2D and 3D U-Nets. This approach achieved the highest mean dice scores throughout all classes and seven phases (except class 1 in BrainTumour). While, (Alom et al. 2019) integrated Recurrent U-Net (RU-Net) with recurrent residual U-Net (R2U-Net) by extending U-Net architecture utilizing RCNNs and recurrent residual CNNs for MIS. This definitely provide a number of benefits. For training deep architectures, a residual unit is helpful and superior feature representation for segmentation tasks is ensured by feature accumulation using recurrent residual convolutional layers. This also enables creating U-Net architectures for better MIS and using the same number of network parameters.

(Jha et al. 2019) developed a sophisticated u-shape structure, known as ResUNet +  + , for the purpose of polyp segmentation in medical image analysis. This model incorporates residual style, squeeze and excitation module, Atrous Spatial Pyramidal Pooling (ASPP), and attention mechanism. Jha et al. (2020) developed a DoubleU-Net network that consecutively integrates two U-Net architectures. The encoder phase incorporates ASPP at the end of each down-sample layer to acquire contextual features. The evaluation outcome indicates that DoubleU-Net has strong performance in the segmentation of polyps, lesion boundaries, and nuclei. (J. Chen, Lu, et al., 2021) introduced a TransUNet approach that utilizes a CNN as the initial phase of the encoder for extracting image patches. Then employs a Transformer framework for extracting the global contexts. The final integrated characteristic in the decoder improved the accuracy. To elevate the semantic segmentation performance for diverse medical images, (Lin et al. 2022) introduced a Dual Swin Transformer U-Net (DS-TransUNet) approach. This approach integrates the benefits of a hierarchical Swin Transformer into both the encoder and decoder components of the conventional U-shaped architecture. To enhance the establishment of global dependencies among features at different scales, a Transformer Interactive Fusion (TIF) module has been employed. Additionally, the Swin Transformer block was incorporated into the decoder, enabling more in-depth exploration of long-range contextual information throughout the up-sampling process. Xu et al. (2023a, b) introduced a deeper and more compact split-attention u-shape network (DCSAU-Net) approach that effectively leverages low-level and high-level semantic details using two innovative frameworks: primary feature conservation and compact split-attention block.

To enhance and optimize liver segmentation for CT images, (Liu et al. 2019a, b, c) integrated an enhanced U-Net neural network with graph cutting (GIU-Net). In the initial stage, an improved U-Net is utilized for segmenting a liver from the liver CT sequence and generate a probability distribution map of liver regions. Further, the context information from the liver sequence images with the liver probability distribution map are then utilized in building a graph cut energy function, which is minimized for achieving the segmentation. Gu et al. (2019) suggested a context encoder network (CE-Net) approach for segmenting maximum high-level information and preserving spatial information for various 2D medical images. CE-Net consists of three modules: 1) feature encoder 2) context extractor which is created by two blocks of dense atrous convolution (DAC) and residual multi-kernel pooling (RMP) 3) feature decoder. For feature extracting a pre-trained ResNet block was used. As reported, the CE-Net approach improved the segmentation performance in various medical tasks.

In MIS, there are discrepancies between the features propagating through the decoder network and the features passed from the encoder network. Applying Res paths involves some extra processing for producing the two feature maps. In order to reconcile these two irreconcilable sets of features, the MultiResUNet model developed by (Ibtehaz and Rahman 2020) which give U-Net the capacity to perform multi-resolution analysis. They developed a small, lightweight, and memory-efficient similar structure using Inception blocks as its model. The developed model was tested on five distinct datasets Murphy Lab, ISBI-2012, ISIC-2018, CVC-ClinicDB, and BraTS17. For improving the efficiency of brain tumor segmentation, (Rehman et al. 2020) introduced the BU-Net approach by integrating residual extended skip (RES) and wide context (WC) modules with a customized loss function in the baseline U-Net to extract contextual information and aggregation features. As reported, the BU-Net loses local and context details among different slices.

To segment the Environmental Microorganism (EM) images, (Zhang et al. 2021a, b) introduced a Low-cost U-Net (LCU-Net). Based on U-Net, Inception, and concatenate operations, the LCU-Net is an advanced CNN. Moreover, the outcomes of the segmentation are improved by applying the connected Conditional Random Field (Dense CRF) as post-processing for getting global information. The LCU-Net model not only outperforms the original U-Net in terms of performance but also lowers memory use from 355 to 103 MB. The values of evaluation indices improvement in Dice, Jaccard, Precision, Recall, Accuracy, and volume overlap error (VOE) of segmentation results using LCU-Net. Wang et al. (2021a, b) developed an HDA-ResUNet model to obtain high segmentation performance on various medical images. Initially, residual connections were added to all the layers to enhance the training process. Then, the channel attention (CA) block reduced the effect of noise and concentrates on essential regions. Finally, a hybrid dilated attention convolutional (HDAC) layer was used to increase the particular field, extract global information, and minimizes the impact of learned redundant features. The Experimental evaluations showed that the HDA-ResUNet achieved better performance than U-Net with fewer parameters.

Taking the advantage of Hierarchical Swin Transformer (HST) (Lin et al. 2022) developed a deep MIS model Dual Swin Transformer U-Net (DS-TransUNet), which incorporates the benefits of the HST into the encoder and the decoder of the conventional U-shaped for improving the medial image semantic segmentation quality. They creatively added the Swin Transformer block to the decoder in addition to the encoder. Furthermore, they proposed a dual-branch Swin Transformer to the encoder for extracting multi-scale features. They also suggested a Transformer Interactive Fusion (TIF) for integrating the multiscale features from the encoder by creating long-range relationships between features of various scales through self-attention mechanisms. Cao et al. (2023) introduced a Swin transformer-based U-shaped encoder-decoder approach (Swin-Unet) for MIS. The Swin Transformer block was used as a fundamental unit for feature representation and to enable long-range semantic information interactive learning. Through extensive experimentation on multi-organ and cardiac segmentation tasks, Swin-Unet exhibits superior performance and generalization abilities.

SegNet: It is a CNN architecture that was initially developed for semantic image segmentation in computer vision applications. Based on the encoder-decoder network of the SegNet layer, (Khagi and Kwon 2018) developed a DNN model for segmenting an MRI image. To automatically segment brain tumor and sub-tumor regions, (Alqazzaz et al. 2019); (Hu et al. 2020); and (Dayananda et al. 2022) developed an approaches based on SegNet architecture. (Alqazzaz et al. 2019) utilized FCN network SegNet to 3D data sets for four MRI models (Flair, T1, T1ce, and T2). The multiple trained SegNet models are combined through post-processing to create four maximum feature maps, which further improves tumor segmentation. The most intriguing information is encoded into a feature representation by combining the maximal feature maps and the pixel intensity values of the original MRI models. Hu et al. (2020) presented a Brain-SegNet approach to automatically segment brain lesions from 3D MRIs including brain tumors and ischemic stroke lesions. Brain-SegNet uses 3D convolutional layers to effectively integrate both local details and 3D semantic context information. Brain-SegNet has a relatively fast processing time of around 0.5 s per MRI, which is approximately 50 times faster than previous methods. However, it may encounter difficulties when segmenting lesions with irregular shapes or unclear boundaries. Dayananda et al. (2022) presented a model for segmenting brain tissues in MRI by combining U-SegNet with fire modules and residual convolutions. As a result, the residual connections and squeeze-expand convolutional layers of the fire module produce a lighter and more efficient brain MRI segmentation architecture in the proposed approach. The residual unit enables efficient training for deep architecture and better representation for the features acquired from residual convolutions. Additionally, this approach offers efficient architecture for segmenting brain MRI images with less network parameters and improved accuracy.

To identify sick lung tissue in images of COVID-19 patients' lungs, (Saood and Hatem 2021) applied two well-known DL networks, SegNet and U-Net as multi-class segmentation to identify the type of infection on the lung and as binary segmentation to distinguish between infected and normal lung tissue. The results demonstrate SegNet's greater performance in comparison to the other techniques, while U-Net performs better as a multi-class segmentation. Han et al. (2023) proposed an HWA-SegNet approach to address challenges in skin lesion image segmentation such as insufficient feature expression in small-sized samples, irregular shapes of segmented targets, and inaccurate judgment of edge texture. The approach involves several steps: first, an improved discrete Fourier transform (DFT) is utilized to analyze the image features, and multi-channel data is expanded for each image. Second, a hierarchical dilated analysis module is built for realize the semantic features in the multi-channel data. At the end, the pre-predicted results are fine-tuned based on a weight adjustment structure with fully connected layers to achieve higher accuracy in the predicted results.

Region Convolutional Neural Network (R-CNN): R-CNN is a CNN architecture that has been utilized for object detection and segmentation tasks. To create bounding boxes, the R-CNN architecture employs a selective search procedure to produce a region network. After being transformed into standard squares, these regions are transmitted to CNN, which generates a feature vector map. Liu et al. (2018) applied a Mask R-CNN method with ResNet101 and FPN backbones to tackle challenges in lung nodule segmentation, including low-quality CT images, limited annotated data, and the complex shapes and unclear contours of pulmonary nodules. For effective, accurate, and automatic melanoma region segmentation in dermoscopic images, (Nida et al. 2019) presented a deep RCNN and fuzzy C-mean (FCM) approach. The three stages of the presented approach are skin refinement, localization of the melanoma tissue, and segmentation of the melanoma. The several harmed regions are accurately detected using RCNN in the form of frames, which simplifies localization via FCM clustering. For localizing the optic nerve head (ONH) and segmenting the optic disc/cup in retinal fundus images, (Almubarak et al. 2020) proposed a two-stage Mask R-CNN. Initially, they locate and crop around ONH. To further improve the detection process, they combine the cropped image with the original training image and train a separate detection network based on various scales for R-CNN anchors. The cropped images from the first stage are utilized as input in the second stage, which is trained based on a weighted loss to achieve the final segmentation.

For medical disc image segmentation, (Vania and Lee 2021) presented a multi-stage optimization mask-RCNN (MOM-RCNN) approach. The MOM-RCNN approach has four phases: Backbone, Neck, DenseHead, and ROIHead (Region of Interest Head). The MOM-RCNN approach is trained using stochastic gradient descent and adaptive moment estimation (Adam) with T1 and T2. For MIS (Felfeliyan et al. 2022) suggested a Self-Supervised-Mask-RCNN (SS-MRCNN) model that learns both semantic and pixel-level information. The SS-MRCNN just employs one network without the need for many inputs. The SS-MRCNN pre-training provides a method for extracting pertinent features from MRI datasets without the need for annotations. To locate the distortion region and restore the actual image pixels, the SS-MRCNN structure has been modified. When compared to training from scratch, SS-MRCNN increased the Dice score by 20%. The SS-MRCNN is ideal for a variety of medical image processing operations and is simple and efficient. To tackle the limited labeled data and intra-class (inconsistency and indistinction) issues in segmenting PET/MRI pancreas tumor images, (Yao et al. 2023) proposed an approach called transferred DenseSE-Mask R-CNN (TDSMask R-CNN), which suppresses irrelevant information and reduces false positives in tumor area segmentation. Additionally, the accurate tumor location from PET images is transferred to the MRI training model for guiding Dense-SE network learning. This helps mitigate the small label sample problem and minimize network overfitting. Overall, the TDSMask R-CNN approach significantly improved the accuracy of pancreatic tumor segmentation.

DeepLab: It is an architecture of CNN that has been specifically developed for the purpose of image segmentation tasks. The use of this technique has been extensive and has shown exceptional performance in many image processing applications, notably in the realm of MIS. The DeepLab designs have gained recognition for their capacity to effectively capture intricate information inside segmented regions, all while retaining a high level of computing efficiency. Ma et al. (2018) provided a DL based technique for Al–La alloy microscopic image segmentation. For effective picture segmentation, built a deep CNN based on DeepLab. Applied a local processing approach based on a symmetrical overlap-tile technique that allows high-resolution analysis of microscopic images. It also accomplishes seamless segmentation. Used symmetrical rectification to improve the precision of outcomes with 3D data. The offered technique has the drawback of not segmenting well when two dendritic structures are closer to each other. Tang et al. (2020) developed a Faster R-CNN and DeepLab approach to overcome the challenges of fuzzy boundaries, complex liver anatomy, the presence of pathologies, and diversified shape in liver image segmentation. Where an enhanced Faster R-CNN is used for detecting the approximate liver position. The resulting images are then processed and supplied into DeepLab for obtaining the liver contour. This approach improved liver image segmentation.

For improving the performance of MIS, (Wang and Liu 2021); (Azad et al. 2022); (Shia et al. 2022); and (J. Yang et al. 2023) proposed an approaches utilizing Deeplabv3 architecture. Wang and Liu (2021) presented a multi-scale inputs Deeplabv3 + network to enhance the performance of medical image detection and segmentation of malignant regions of gastric cancer. In comparison to SegNet and Faster-RCNN models, the Deeplabv3 + assessments of sensitivity, specificity, accuracy, and Dice coefficient are 91.45%, 92.31%, 95.76%, and 91.66%, respectively. The model's parameter scale is also significantly decreased. For MIS, (Azad et al. 2022) suggested the TransDeepLab approach, a transformer-based architecture. For extending DeepLabv3 + and model the Atrous Spatial Pyramid Pooling (ASPP) module, they specifically use a HST with shifted windows. The proposed approach outperforms or is on par with the majority of recent efforts that combine CNN-based and Vision Transformer techniques, and it significantly reduces model complexity. In order to speed up the diagnostic and enhance image quality, (Shia et al. 2022) developed a DeepLab v3 + with the ResNet-50 decoder model to recognize and segment the malignant Breast Imaging Reporting and Data System (BI-RADS) on breast ultrasound images. The developed model demonstrated the performance in semantic segmentation with a means of accuracy and intersection over union (IU) of 44.04% and 34.92%, respectively. (J. Yang et al. 2023) presented a TSE DeepLab based on the DeepLabv3 framework to enhance the performance of MIS and retains the original atrous convolution for local feature extraction. The feature maps after the backbone are then converted into visual tokens, which are fed into a Transformer approach to improve global feature extraction. Squeeze and excitation components are also included to prioritize the importance of channels after the Transformer module, enabling the model to focus on significant pixel features for each channel.

3.2.5.2 Unsupervised approaches

The approach of unsupervised DL in image segmentation entails the training of neural networks to autonomously identify and segment objects or regions of interest in images, without relying on explicit pixel-level annotations. Unsupervised segmentation approaches may prove to be very advantageous in situations when the acquisition of labeled training data is cost-prohibitive impractical and challenging.

Generative Adversarial Network (GAN): GANs have emerged as significant advancements in deep networks, garnering considerable interest among the scientific community due to their extensive range of applications in medical imaging. In contrast to conventional DNN, GANs represent a various kind of DNN whereby two networks are trained concurrently. For the automatic semantic segmentation of multiple spinal structures, (Z. Han et al. 2018) presented the Spine-GAN Recurrent Generative Adversarial Network. An atrous convolution such as convolution with holes, specifically addresses the significant variety and variability of complicated spine structures. The spatial pathologic relationships between normal and abnormal structures are dynamically modeled. Spine-GAN uses a discriminative network that can rectify anticipated errors and global-level adjacency to provide reliable performance and effective generalization. Spine-GAN scored high values in accuracy, Dice coefficient, Sensitivity, and Specificity. For the purpose of segmenting 3D multi-modal images, (Mondal et al. 2018) introduced a model using GANs. This approach avoids over-fitting by training to distinguish between real and fake patches produced by a generator network. Additionally, it offers a method for few-shot learning that eliminates the requirement for an initial pre-trained network by making use of GANs' semi-supervised learning capabilities.

For MIS some authors developed multi-stage GANs such as (Xue et al. 2018) suggested a GAN with a multi-scale loss architecture called SegAN, where FCN network is used to construct segmentation labeling maps. The SegAN imposes the critic and segmentation unit to acquire the global and local features representing both the long- and short-range spatial correlations between pixels. The evaluations reported that SegAN found to be significantly more efficient and produces better performance compared to singlescale loss or the traditional pixel-wise softmax loss. Similarly, for segmenting nucleus images (Pandey et al. 2020) integrated a two-stage GANs, where the first GAN stage is used to produce a synthesized binary mask and the second GAN stage produces the synthetic image. These two GANs are integrated to produce the synthesized image-mask pairs, which are utilized to expand the volume of training dataset for the segmentation process and also enhance the performance of the conventional segmentation approach. Furthermore, (Khaled et al. 2022) developed a multi-stage GAN model for brain segmentation. This model creates a coarse outline for the background and brain tissues and provides a more precise shape for the white matter (WM), grey matter (GM), and cerebrospinal fluid (CSF). The coarse and fine outlines combined to obtain a good outcome. The model has increased the DC accuracy by 5% and the performance by 2.69 − 13.93 min faster than the other compared models with only few-shot learning and a small amount of labeled data.

For segmenting colorectal tumor tissues from CT images, (Liu et al. 2019a, b, c) proposed a label assignment generative adversarial network (LAGAN) approach. LAGAN assigns labels to the deep network outputs, in addition to providing a distinct post processing function to improve the segmentation outcomes of CNN. First, FCN32 and Unet independently segment the colorectal tumors on the CT scans, in order to produce probabilistic maps. After that the LAGAN label the probabilistic maps then, the binary segmentation results are attained. Rezaei et al. (2020) introduced the RNN-GAN, a recurrent generative adversarial architecture, to tackle the imbalanced pixel label problem in MIS. RNN-GAN depends on a generator and a discriminator. The generator is trained over a series of medical images to understand the corresponding segmentation label map and the discriminator is trained to distinguish the ground truth and the generated segmentation image. In order to improve temporal consistency and provide both inter and intra slice representation of the features, bidirectional LSTM units were used to replace both the generator and discriminator. In addition, the implications of various architectural options and losses support better semantic segmentation outcomes.

By rebuilding the discriminator with EM loss, (Tan et al. 2021) presented a DL lung segmentation model (LGAN) using a GAN. The segmentation of the lung is carried out by an antagonistic between a network that generates segmentation masks and the discriminator network that can recognize the actual masks from the generated mask. When compared to a single network for image segmentation, this antagonistic makes the generated mask more efficient and convincing. Additionally, the LGAN model can be used with many types of segmentation networks. Sarker et al. (2021) developed SLSNet, a simple and effective GAN based skin lesion segmentation model. A 1-D kernel factorized network, multi-scale aggregation, and position and channel attention methods are the components of the GAN model that was modified to develop SLSNet. The 1-D kernel factorized network minimizes the 2D filtering's computational cost. While, the position and channel attention components, respectively, improve the ability in distinguishing between the lesion and non-lesion feature impersonation in the spatial and channel dimensions. For aggregating the coarse-to-fine characteristics of the input skin images and decrease the impact of the artifacts, a multiscale block is also utilized.

The authors in (Kawahara et al. 2022) presented an auto segmentation model utilizing a GAN with U-net for Magnetic Resonance (MR) images of head and neck (HN) cancer. The image needed more spatial information, thus the generator employed U-net. The GAN utilized the discriminator to distinguish between the ground truth and the segmentation produced by the generator would demand adequate training with the addition of patched images. However, providing learned parameters and making precise distinctions between actual and fake segmentations, the presented 2.5D GAN contributes to efficient segmentation. Compared to 3D GAN, 2.5D U-net, and 3D U-net, 2.5D GAN considerably increases the accuracy of segmentation for most organs at risk (OAR) of HN patients. For brain stroke lesions segmentation, (Wang et al. 2022a, b) presented the Consistent Perception GAN (CPGAN) model by collecting the data of multi-scale features, the Similarity Connection Module (SCM) is employed in CPGAN to enhance the details of the lesion region of segmentation. The SCM can use a weighted sum to selectively group the features at each point. Additionally, CPGAN utilized a consistent perception technique to improve the accuracy of brain stroke lesion prediction for the unlabeled information. Moreover, an additional assistant network is built to motivate the discriminator for picking up useful feature representations frequently lost during training. The discriminator and assistance network are used to determine the real or fake segmentation findings. Altun Güven and Talu (2023) introduced an improved GAN approach called Supervised SimDCL (SSimDCL) for segmenting brain MRI images. The initial study conducted on SSimDCL demonstrated an improvement in dataset resolution. Notably, the performance of SSimDCL was found to be satisfactory in comparison to other commonly utilized approaches, including VolBrain, CycleGAN, CUT, FastCUT, DCLGAN, and SimDCL.

Deep Embedded Clustering (DEC): DEC has garnered significant attention in MIS due to its remarkable performance in the recent years. To this end, (Enguehard et al. 2019) presents a flexible framework for semi-supervised learning integrating supervised methods based on deep CNN to learn feature representations and a deeply embedded clustering approach for allocating data points into clusters based on their probability variations and the representation of features discovered by the networks. The semi-supervised learning approach based on deeply embedded clustering known as SSLDEC designed to learn feature representations through an iterative process. This process involves alternating between labeled and unlabeled data points and generating target distributions based on the predictions made. Throughout this iterative process, the algorithm leverages labeled samples to ensure the model remains consistent and aligned with the provided labels. Simultaneously, it also learns to enhance feature representation and predictions. The SSLDEC algorithm necessitates a limited number of hyper-parameters, thereby obviating the need for extensive labeled validation sets. This characteristic effectively mitigates a significant constraint observed in numerous semi-supervised learning algorithms. Furthermore, (Pathan and Tripathi 2020) introduced a deep clustering architecture in conjunction with image segmentation techniques for MIA. The primary concept revolves around the utilization of unsupervised learning techniques to cluster images based on the severity of the disease in the subject's sample. Subsequently, these images are segmented to emphasize and delineate regions of interest. The proposed architectural design outperforms the results achieved by UNet and DeepLab on two datasets, while also exhibiting a parameter count that is less than half of these existing models. The suggested architectural framework has the potential to facilitate timely detection and minimize the need for excessive human involvement in the labeling process, which can often be laborious and monotonous.

3.3 Medical image feature extraction

Medical image feature extraction is the process of identifying and capturing meaningful information from medical images and representing the information in a lower dimensional space (Kumar and Bhatia 2014). It involves analyzing the images to identify certain patterns or features, which can be used for segmentation and classification processes to diagnose diseases, track disease progression, or guide treatment decisions. Figure 7 shows the process flow of feature extraction in MIA. Figure 8 present a taxonomy on various approaches for feature extraction in MIA, including color based, texture based, shape based, and DL based. Table 4 illustrates the Comparative analysis of medical image feature extraction approaches.

Fig. 7
figure 7

The process flow of medical image feature extraction

Fig. 8
figure 8

Taxonomy of feature extraction approaches

Table 4 Comparative analysis of medical image feature extraction approaches

3.3.1 Color based approaches

Color-based feature extraction approaches are commonly utilized in MIA to extract meaningful information from the color distribution of pixels within an image. These approaches can be used to identify and quantify abnormalities in tissue such as the presence of lesions or blood vessels. In order to extract color features from medical images, a particular color space must be chosen and a conversion process must be followed. Then, the color space is quantized into a color histogram, which presents global features of the image. These features are transformed into feature vectors, which are supplied as an input to the segmentation and classification processes. Typically, various color spaces are utilized including RGB, HSV, and Lab. Furthermore, common color-based feature extraction approaches used in medical imaging such as color histograms, color moments, and color coherence vectors (CCV). To this end, various approaches were proposed on medical images to extract ROI such as (Akakin and Gurcan 2012); (Mercan et al. 2016); (Rashighi and Harris 2017). The Lab color histogram is then applied for extracting color features from the prospective ROI. The medical images are then segmented and classified utilizing the feature vector. All color modes perceivable by the human eye are accounted for in the lab theory. Thus, the extracted features out of the Lab space stand in for intensity and color data.

HSV-based color properties are closer to how colors really appear to the human eye (Smith 1978). Hue, Saturation, and Value are its "three legs," as it were. HSV is only one of three color spaces utilized in the articles (Homeyer et al. 2013); (Bautista et al. 2014)). This is due to the fact that the single-color space can only display a limited range of colors and the histopathology pictures have been artificially dyed to emphasize interconnected cellular structures. Therefore, integrating many color spaces allows for the extraction of additional color information. These two studies all employ color moments to isolate color characteristics. The first-order moments (mean, median, etc.) and the second-order moments (variance, standard deviation, etc.) are often utilized in histopathological medical images. Another benefit is that, unlike the color histogram, the color distribution in the picture may be expressed without having to resort to vectorizing the features.

3.3.2 Texture and shape based approaches

Texture and shape feature extraction are important techniques used in MIA. Texture analysis involves the identification and quantification of patterns in the intensity variations of an image, while shape analysis refers to the identification and quantification of geometric properties of structures within an image. There are various approaches available for texture and shape feature extraction, ranging from simple statistical approaches to more complex machine-learning approaches, including GLCM, LBP, Gabor filter, and morphological features. The selection of a suitable approach mainly depends on a specific application and imaging data characteristics.

3.3.2.1 Gray level co-occurrence matrix (GLCM)

GLCM technique is widely utilized for extracting texture features from digital images. GLCM process involves the measurement and analysis of the spatial correlations among pixel values inside an image, enabling the characterization of texture and pattern features. The GLCM is extensively used in diverse image analysis applications, including image classification, segmentation, and texture analysis. Mall et al. (2019) proposed an ML approach for abnormality detection in bone X-ray images using GLCM texture features. For anomaly detection, several ML approaches are employed, such as LBF SVM, linear SVM, logistic regression, and decision trees. Five statistical metrics: sensitivity, specificity, precision, accuracy, and F1 Score utilized for analyzing the performance of this approach. These parameters demonstrate a considerable increase. Ahmed (2020) presented a relevant feedback retrieval method (RFRM) for enhancing and improving Content-based image medical retrieval (CBMIR) using GLCM texture features. Each image was represented by eighteen features extracted by utilizing color moments and GLCM texture features, with eight common similarity coefficients applied as similarity indicators. The Kvasir dataset is utilized for evaluating this approach, which consists of 4,000 images separated into eight categories.

For content-based medical image retrieval, (Madhu & Kumar, 2022) introduced a hybrid feature extraction approach that employs the use of segmentation, clustering, and GLCM texture features. The MIAS dataset was utilized to test the introduced approach, and the findings demonstrate that it performs better in terms of retrieval accuracy than previous approaches. Karthikamani and Rajaguru (2022) applied GLCM texture features and statistical approaches for extracting features from normal and liver cirrhosis in US images utilizing multiple classifiers to normalize and classify the extracted GLCM features such as Cuckoo search (CS), Gaussian Mixture Model (GMM), Particle Swarm Optimization (PSO), Firefly, Elephant search, and Dragon Fly. For statistical characteristics. The GMM classifier fared better than other classifiers obtaining the greatest accuracy value and lowest error rate. For identifying abnormalities in X-ray images, (Narayan et al. 2023) proposed a Fuzzy Net classification approach utilizing GLCM Feature and PSO technique. To evaluate the performance a confusion matrix and graphs of the model's learned membership functions were utilized.

3.3.2.2 Local Binary Patterns (LBP)

LBP technique is a commonly used approach for extracting texture features in the fields of image analysis. This method demonstrates a high degree of efficacy in delineating the specific patterns and textures that are inherent to a given image. The process of LBP involves an analysis of the relations between the intensity levels of a given pixel and the neighboring pixels. For the retrieval of biological images, (Sucharitha and Senapati 2019) introduced a local extreme co-occurrence edge binary pattern (LECoEBP) approach. Local extreme edge binary patterns (LEEBP) provide texture features, and GLCM determines the co-occurrence of binary patterns. In comparison to other approaches, the integration approach of LEEBP and GLCM provided considerable outcomes. In this article, (Kaplan et al. 2020) utilized two separate feature extraction approaches (nLBPd and LBP) to classify the most prevalent forms of brain tumors, including gliomas, meningiomas, and pituitary. With the feature matrices produced by nLBP, LBP, and classical LBP, the classification process was carried out utilizing KNN, ANN, RF, A1DE, and LDA classifiers. The highest classification accuracy of brain tumors was 95.56% with the nLBPd = 1 feature extraction and Knn approach.

To detect, classify, and diagnose skin lesions disease such as Malignant, Benign, and Atypical, (Senan and Jadhav 2021) proposed an approach consisting of several stages: (1) preprocessing to improve the input images based on the Gaussian filter and morphological method, (2) dividing the lesion region from the rest of the image based on Active Contour Technique, 3) extraction of features from RoI using the LBP, 3) the extracted features applied as an input to the classification techniques SVM and K-NN. For embedding and extracting important features from the host image, (Laouamer 2022) developed an informed watermarking approach for medical images based on local binary patterns (LBP).

3.3.2.3 Gabor filter

Gabor filters have gained significant popularity as a prominent method for extracting image features, especially in the domains of texture analysis and pattern recognition. This method intends to identify the texture details present in an image via the analysis of local spatial frequency and orientation features. To this end, (Lan et al. 2018) utilized Gabor and Schmid filters to extract the texture features for medical image retrieval, then divided the filtered images into small non-overlapping patches. Finally, the bag-of-words (BoW) model was utilized to get a feature representation of the medical image. Khan et al. (2019) presented an improved Gabor filter bank approach employing the "Cuckoo Search" metaheuristic algorithm for selecting the most effective Gabor filters for breast cancer detection. The presented approach outperformed existing approaches and demonstrated high specificity and low FP while achieving almost 100% accuracy. (Nitish et al. 2020) conducted a comparative analysis of several classification approaches for extracting tumorous pictures from non-tumorous MRI images utilizing the thresholding method for segmentation and the Gabor filter for feature extraction. The extracted features are fed into classifiers for SVM, KNN, and logistic regression. The experimental analysis demonstrated the SVM classifier's superior resilience against others.

For retrieving glioma brain tumors from MR images, (Venkatachalam et al. 2021) presented a CBMIR system based on the Gabor Walsh-Hadamard transform (GWHT) feature extraction approach. Initially, many filtering techniques, including Mean, Median, Conservative, and Crimmins Speckle Removal, were used to remove noise. Then, GWHT was utilized for extracting the features. Finally, Fuzzy C means with Minkowski utilized to evaluate similarity distance in order to efficiently diagnose brain tumors utilizing the extracted features. Barshooi and Amirkhani (2022) utilized convolutional DL methods with several filter banks, including the Gabor and the LoG filter, to improve the performance of classifying chest X-ray (CXR) images into non-COVID-19 and COVID-19 classes. A standard data augmentation method with GANs was utilized to address the data limitation issue and various filter banks such as Sobel, Laplacian of Gaussian (LoG), and Gabor filters are applied for deeper extraction of features.

3.3.3 Deep learning-based approaches

The image feature extraction has been significantly transformed by the advent of DL, which has facilitated the automated learning of structured models from raw pixel data. CNNs are widely used in DL for extracting image features, due to their capacity to effectively capture and depict intricate visual patterns. Due to the time and cost-consuming issues of medical image feature extraction and analysis presented in the existing models, (Liebgott et al. 2018) developed the ImFEATbox toolbox for extracting and analyzing image features from various imaging modalities such as CT, MRI, and PET. Both global and local features, as well as feature descriptions, are included in the toolbox. For the purpose of effectively representing and classifying images of histological tissue, (Sari and Gunduz-Demir 2019) presented an unsupervised feature extractor approach. Instead of considering the features of every image location, the feature extractor identifies the most significant sub-regions in an image and quantifies them. It proposes a DL-based approach to leverage the distribution of these quantization to describe and classify images by extracting a collection of features from the image data and quantifying the most important sub-regions. This approach builds a deep belief network of restricted Boltzmann machines (RBMs), determines the activation values of hidden unit nodes, and extracts features from input data.

3.3.3.1 Convolutional neural network (CNN)

CNNs effectively extract the features from medical images. These features could be utilized as input into further MIA stages such as segmentation and classification or fully connected layers. Several popular deep architectures, including Inception, DenseNet, and ResNet, have been developed for feature extraction based on Convolutional Neural Networks (CNNs). Based on deep CNN architecture (Öztürk 2020); (Mohite & Gonde 2022); and (Chakraborty et al. 2023) developed an approaches for extracting image features. (Öztürk 2020) introduced a CNN approach with Synthetic Minority Over-sampling Technique (SMOTE) to minimize the semantic gap between the low level features and the high level semantics for imbalanced medical image datasets. Initially, CNN is applied to automatically capture discriminative features from images, then the extracted features are utilized to create hash IRMA codes. (Mohite and Gonde 2022) proposed an approach based on CNNs to enhance the classification and retrieval of medical images. This approach leverages multiple hidden layers to extract deep features from various levels of abstraction. These deep features are then utilized to compute the similarity index between images through the use of distance measures and similarity indices. Due to the incomplete knowledge about brain function embedded which causes loss of important information, (Chakraborty et al. 2023) developed a CNN approach for automatic feature extraction and data-driven from brain 3D MRI images. Then, the extracted features are utilized as endophenotypes in genome-wide association studies (GWASs) for detecting associated genetic variants. To enhance the classification performance of multilevel transfer learning (MLTL) framework in digital breast tomosynthesis (DBT) images, (Aswiga et al. 2021) developed a feature extraction-based transfer learning (FETL) approach, which employed different feature extraction methods including CNCF, GLCM, and Multi-input perceptron.

Several researches such as (Wang et al. 2020; Zhang et al. 2021a, b; Gao et al. 2022) presented a specialized deep learning CNN architecture that is more suitable for their specific learning objective. To extract features from breast histopathological images, (Wang et al. 2020) utilized an integrated approach of a convolutional neural network (CNN) and a graph convolutional network (GCN). The CNN extracts high-level features, which were used to build an adaptive graph. GCN then uses these graphs to extract specific spatial features, which were used for the classification task. Zhang et al. (2021a, b) and (Gao et al. 2022) utilized CNN features to represent medical images for generating a superpixel representation, which is further utilized to produce a graph embedding representation. The graph embedding attributes, together with the CNN's features, are used as input for a graph convolutional network (GCN) to perform cancer detection and localization of regions.

Inception Networks: Several medical image analysis tasks, including detection, segmentation, and classification, have been effectively implemented using CNNs from the Inception family. (Arvaniti et al. 2018) analyzed 886 patients' manually annotated images to examine how well Inception-V3 could grade prostate cancer. Pathologists might benefit from deep CNN architectures in practical applications, according to the results. To classify image tiles as adenocarcinoma, adenoma, or non-neoplastic, (Iizuka et al. 2020) utilized Inception-V3. A long short-term memory (LSTM) is utilized to collect Inception-V3 features by dividing medical images into a set of tiles. The authors trained and validated Inception-V3 and LSTM models using an annotated collection of medical images that included colon and stomach cancer diagnoses. To detect colorectal cancer, (Wang et al. 2021a, b) introduced Inception-V3, which has so far achieved the greatest performance with the biggest number of sample sizes and data sources.

AlexNet Networks: The eight layers that make up AlexNet include fully connected layers, max-pooling, and convolutions. To prevent approach overfitting, all dropout layers are used excluding the output layer employs rectified linear units (ReLU) instead of the hyperbolic tangent (tanh) activation. (Jiménez del Toro et al. 2017) classified high-grade Gleason in cancer of the prostate utilizing AlexNet. The research compared the accuracy and speed of two neural networks, AlexNet and GoogleNet, and found that AlexNet achieved similar results in less time. Ren et al. (2018) used the AlexNet model to acquire knowledge about the characteristics of image patches to predict the survival outcomes of patients with prostate cancer. The extracted features, when combined with genetic sequences, serve as input for an LSTM approach, to understand the spatial correlation between adjacent image patches.

ResNet Networks: Residual Networks, often known as ResNets, are a specific neural network architecture designed to tackle the difficulties associated with training neural networks that have a large number of layers. (Campanella et al. 2019) utilized a ResNet-34 architecture for grading prostate cancer based on the multiple-instance learning approach. This approach combines the most important image tile characteristics by utilizing an RNN model. (Chen et al. 2021) constructed a ResNet-50 approach to extract important morphological features in the tumor microenvironment from whole slide images (WSI) and divide them into patches. The features are next used to construct a graph embedding for the WSI. This graph embedding is subsequently fed into a GCN model for cancer patient survival prediction. (Cheng et al. 2021) utilized ResNet-50 For diagnosing cervical cancer. The features are extracted from lower-resolution images to recognize the most suspect spots within WSI. A second ResNet-50 model is then used to verify these regions using higher-resolution images.

VGG networks: One popular architecture for deep CNNs is the VGG network, which is utilized extensively for medical image segmentation and pattern recognition at various context levels. (Faust et al. 2018) utilized VGG19 to perform probability scores and t-SNE dimension reduction to view the various tissue classes recognized by the approach for region-based tissue classification and WSI image visualization. Furthermore, (Dodington et al. 2021) used VGG19 for tumor region recognition and then utilized a U-Net architecture to segment tiled images for benign vs. tumor tissue-level classification. Finally, to examine the influence of neoadjuvant chemotherapy treatment on breast cancer patients, tissue-level quantification is performed based on specific features such as nucleus shape, intensity, and texture. Chen et al. (2021a, b, c, d) segmented WSIs using VGG16, which extracted multi-level feature maps overlaid by superpixel contours to identify context scales and semantic patterns. When it comes to Gleason grading in prostate cancer, (Xu et al. 2020) uses VGG16 as a baseline approach and compares it with an SVM approach trained utilizing LBP features.

3.3.3.2 Recurrent neural networks (RNNs)

RNNs are a kind of artificial neural network that can process inputs in any order or sequence. Common applications of such neural networks include image analysis and recognition, all of which deal with ordinal or temporal data. RNNs are different from CNNs because of their "memory," which stores information about past inputs and uses it to alter their present state and output. (Campanella et al. 2019) and (Kanavati et al. 2022) classified cancer using RNNs by aggregating the highest-ranking tiles at the slide level according to features produced by a CNN model. The RNN model is successively provided with the most dubious tiles from each slide to predict the final slide-level classification. (Ren et al. 2018) utilized Long Short-Term Memory (LSTM) networks to represent the spatial correlation between neighboring image tiles using the image features produced by an AlexNet approach when integrated with genomic sequencing data. The LSTM's acquired features are used as input for a multilayer perceptron (MLP) approach to develop computational biomarkers for predicting the survival of patients with prostate cancer. (Iizuka et al. 2020) utilized LSTMs to integrate features produced by a CNN model via the segmentation of WSIs into a collection of tiles. A variable number of tiles are selected from the tissue regions for each slide and inputted into the RNN model. To eliminate dependence on the sequence of tile input, the features of the tiles are randomized throughout each training cycle.

An updated RNN trained using a chain of CNNs proposed by (Mukherjee et al. 2019) namely Recurrent CNN (RCNN). To address the issue of super-resolution in WSI, the model was created to make use of several copies of each WSI image, each with a unique resolution. To create subsequent higher-resolution instances of the same image, each image instance is fed into a single CNN. The next CNN takes the features produced by this stage as input, and the third resolution level of the same image serves as input to this CNN. This procedure keeps running until the most recent, highest-resolution image instance is recreated. The "RCNN" moniker comes from the common practice of feeding the input features of one CNN into the next in a consecutive manner. Cheng et al. (2021) utilizes multiple-resolution WSI to predict the presence of lesion cells. This is achieved by a progressive approach to recognizing lesion cells, which involves the integration of low- and high-resolution WSIs. Additionally, employs a classification model based on RNN to assess the severity of lesions in WSI. The first step involves the examination of WSI at a lower resolution using a CNN model to detect areas that show positive features. These identified regions are then verified at a higher resolution using another CNN model. Ultimately, the method detects and highlights the top 10 most dubious areas of abnormal tissue on every slide, allowing cytopathologists to conduct a more in-depth analysis. An RNN model is used to estimate the probability of a lesion degree, relying on the features of the top 10 suspicious images that have been retrieved using a CNN.

3.3.4 Deep autoencoder networks

In an autoencoder neural network, the dimensions of the input and output layers are identical. Both the number of input units and the number of output units are equal. Thus, an autoencoder is a kind of unsupervised neural network also known as a replicator neural network since it uses input data to generate output data. To predict whether patients would survive kidney malignancies, (Cheng et al. 2018) utilized the sparse autoencoder. The sparse autoencoder approach fed image patches of several nuclei kinds to get a low-dimensional representation of nuclei. Then, similar nuclei features are grouped utilizing k-means clustering to create a topological representation of the WSI. Awan and Rajpoot (2018) proposed convolutional autoencoders in WSI registration via the optimization of feature similarity between static and dynamic images. To investigate how the loss function and latent representation impact the patch-based classification of WSIs for breast cancer diagnosis, (Lomacenkova and Arandjelovic 2021) performed an experimental assessment using a deep autoencoder. There was a significant reduction in the false-negative rate, as shown by the findings when task-specific loss function changes were made that more intelligently considered the content of particular patches. Sun et al. (2022) analyzed the association between the features learned by the variational autoencoder and the prognosis of colorectal cancer patients to predict whether or not they will survive the disease. Then, the features that were created utilized to enhance risk classification for colorectal cancer patients by predicting their survival following adjuvant treatment. Clinical practitioners may find the signature-based features useful in estimating how long chemotherapy treatments should last. (Çelik and Karabatak 2023) introduced autoencoders to assess the effects of using various input image dimensions for WSI compression. Autoencoders reconstruct high-dimensional WSIs via low-dimensional latent representations.

3.3.5 Graph deep embedding

Data is often shown as a graph to describe the connections between things in various applications. The data format and the nature of the underlying issue dictate the optimal graph shape. A basic graph representation has n nodes linked by undirected edges; there are no parallel connections or loops, and the degree of each vertex cannot exceed (n-1). More complex graph topologies, including trees, cyclic and acyclic graphs, and bipartite graphs, are necessary for many applications. Various deep graph embeddings have been put forward in the literature, including structural deep network embeddings, graph convolutional networks, and deep neural networks for learning graph representations. By including the node's location and surrounding data into the graph representation, node embeddings hope to lower the data dimensionality. Machine learning and deep learning models may be trained to conduct node categorization or connection prediction using these embedded nodes.

Graph convolutional networks (GCNs): A Graph convolutional network (GCN) aggregates characteristics from each node's relevant region over many layers using the same principle as graph convolutions. The aggregation filter's weights are learned to anticipate the kind of node. A feature matrix that is the size of the number of nodes times their characteristics may be used to represent the input data. Additionally, an adjacency matrix may be used to depict edges. Each layer of the network trains a set of aggregation functions that take the neighborhood of the current node as input, and the result is an updated feature matrix. This process is repeated throughout the network. The farther up the network, a collection of features travels, the more abstract. In (Sharma et al. 2015); (Zhou et al. 2019); (Sureka et al. 2020); (Chen et al. 2021) studies, the link between detected features is established using graphs that are generated utilizing the spatial features and structure of medical image components. High levels of pre-processing work, such as spatial feature extraction and segmentation, are required for this procedure. Training the GCN using the built-in graph structure and spatial features of nodes is the last stage.

In their work, (Zhou et al. 2019) utilized cell-level graph representation for colorectal cancer classification using GCN. Each node in the network represents a segmented nucleus, and these nodes are described using a set of form and appearance characteristics. Using a specified k-nearest neighbor parameter, authors restrict the degree of the nodes and assess their connections using the Euclidean distance. Ye et al. (2019) utilized the UNet segmentation model's semantic characteristics. By using these traits, the interdependence between lesion regions may be captured, resulting in the formation of graph nodes and edges. Wang et al. (2020) utilized a CNN model to segment nuclei for Gleason grade in prostate cancer. The GCN model is used to conduct Gleason scoring using a poorly supervised technique, with morphology descriptors and texture characteristics serving to characterize each nucleus. Generating graph embeddings for GCN training is another application for the features learned by segmentation models. Cell clustering is another way to create edges in a graph. Cell clustering according to image similarity is achieved by (Shi et al. 2021) utilizing features acquired using DenseNet-121. Following the generation of clusters, relations between nodes are created. The GCN model takes the graph representation as input and utilizes line projection for learning features that are merged with CNN features. For breast cancer detection, (Gao et al. 2022) created a CNN-GCN approach at the patch level. Patches are described and spatial characteristics are generated using a convolutional neural network (CNN), with the connections between nodes established using the K-nearest neighbor method. Then, the clique GCN model takes the built graph as input. The graph pooling layer receives the ultimate output features, which are then passed on to fully connected layers and a softmax function.

3.4 Medical image classification

In MIA, classification of medical images for disease prediction, diagnosis, and treatment is basically achieved by utilizing ML, and DL techniques. DL techniques are getting increasingly popular because of their great accuracy and reduced time complexity while handling huge datasets and complicated tasks. Bone fracture detection and classification, the diagnosis of diabetic retinopathy and Computed Tomography (CT) emphysema, and the categorization of breast cancer lesions in histopathological pictures are some instances of medical image classification. Traditional methods for classifying images are frequently manual and less precise than DL and ML techniques. Figure 9 shows the process flow of medical image classification. Table 5 illustrates the Comparative analysis of medical image classification approaches.

Fig. 9
figure 9

The process flow of medical image classification

Table 5 Comparative analysis of medical image classification approaches

Medical image classification approaches can employ mathematical and statistical techniques to align and segment pictures for diagnosis and treatment. The problem formulation of medical image classification could be represented as follows:

$$c= f(x, \theta )$$
(5)

where \(c\) is the predicted class label or output of the classification approach, \(x\) is the input image, \(\theta\) is the approach parameters, \(f\) is a function that maps the input image and parameters to the predicted output class. In the implementation, \(f\) might assume several forms depending on the utilized image classification technique. Figure 10 present a taxonomy on the classification approaches utilized for MIA. This taxonomy categorizes the classification approaches into two main categories: ML and DL.

Fig. 10
figure 10

Taxonomy of classification approaches

3.4.1 Machine learning approaches

In the foundation of today's AI revolution, ML has unprecedented potential for medical image classification. For instance, ML has been demonstrated to be as effective as medical professionals in diagnosing various illnesses from medical images. There are various methods for medical image classification such as Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), Artificial Neural Network (ANN), Bayesian, and Naïve Bayes. This sub-section briefly describes each of the aforesaid classification methods and reviews their related existing approaches.

3.4.1.1 Support vector machine (SVM)

SVM is a supervised ML method, which is often utilized for image classification. This method is very suitable for both binary and multiclass classification tasks. SVM functions by identifying an optimal hyperplane that effectively discriminates between data points belong to distinct classes within the feature space. In the context of image classification, the features often originate from either the raw image pixels themselves or from more complex representations acquired via the use of feature extraction methods. Vogado et al. (2018) suggested an approach for diagnosing leukemia in blood images based on CNNs without segmentation. The proposed approach directly extracts the features from the images without any earlier preprocessing. The obtained features are then utilized for further classification based on SVM. The results reported that more features are needed for classifying the images with many leukocytes, while fewer features are needed for images with only one leukocyte. Xu et al. (2018) and (Lu and Mandal 2015) applied multi-class SVM classifiers for classifying skin tissue characteristics. The results reported that classification accuracy of over 90% is achieved in distinguishing between cancer and normal skin.

Authors utilized SVM classifier for classifying medical images based on the texture features extracted from the GLCM method such as (Ramasamy & K, 2019) proposed an approach to automatically classify MRI brain images by utilizing SVM and texture features extracted from the GLCM. The method consists of a hierarchical transformation technique (HTT), texture feature extraction, and classification. The utilization of optimal disk-shaped mask selection, top-hat and bottom-hat morphological operations, and mathematical operations are employed by HTT for image preprocessing and enhancement. The GLCM is utilized to extract image contrast, correlation, energy, entropy, and homogeneity. SVMs have the capability in classifying Magnetic MRI brain images into two categories, namely normal and abnormal, by utilizing the extracted features. Renita and Christopher (2020) proposed an approach based on Grey Wolf Optimized (GWO) and SVM named (GWO-SVM) for retrieving significant information from medical applications. The CT scan images are given as input, the color features and texture features are captured based on GLCM, and BoW is used for feature mapping. In this approach, the image retrieval time is high as the whole database is scanned for performing the retrieval, unlike the existing system, where the images are retrieved and then classified. Prakash and Saradha (2021) utilized SVM for classifying MRI images and predicting cirrhosis disease following the preprocessing of MRI images and retrieving the ROI image block. The low-dimension texture features are extracted using LBP and GLCM. Although this method enhanced the classification efficiency, but it lacks transparency.

Authors hybridized SVM with CNN for classifying various tumors presented in different origins such as (Dongyao Jia et al. 2020) presented an approach for classifying cervical cancer cells and (Baranwal et al. 2020) suggested approach for the classification of brain tumors by employing a fusion of CNN and SVM algorithms. This method encompasses the subsequent steps: The initial step in the analysis of cervical cell images involves preprocessing techniques aimed at noise reduction and contrast enhancement. Then, CNN is applied to capture robust features from the preprocessed images, then the extracted features are classified based on SVM in order to distinguish between normal and abnormal cervical cells. The final findings showed that CNN-SVM network demonstrated a notable accuracy rate of 95.5% in detecting cervical cancer cells and 94.5% in classifying brain tumor images. Furthermore, (Deepak and Ameer 2021) applied CNN with SVM for medical image classification. Initially, CNN is utilized for extracting features from brain MRI images and multiclass SVM was used with CNN features to enhance the performance. Figshare open dataset was used for the evaluation, where the evaluation results reported that the SVM classifier performed better than the softmax classifier for the CNN features. Khairandish et al. (2022) proposed an integrated approach of two widely used ML techniques, namely CNN and SVM, for segmenting MRI brain images and identifying the presence of tumors. The threshold segmentation method is employed to partition images into distinct regions by considering the intensity levels of the pixels and a designated threshold value. CNNs are employed for feature extraction from images, while SVMs are used for classifying the extracted features. The efficacy of this method is assessed through the utilization of the BraTS dataset. The approach proposed in the (Nigudgi and Bhyri 2023) study utilized a hybrid-SVM transfer learning technique for classifying lung CT images into normal, benign, or malignant regions. The transfer learning methodology employs a pre-trained CNN model including AlexNet, VGG, and GoogleNet to extract distinctive characteristics from the input images. Subsequently, an SVM classification technique is employed to categorize the extracted features into regions that are either cancerous or non-cancerous. The efficacy of this method is assessed using the IQ-OTH/NCCD dataset, and the findings demonstrate that this approach achieved a significantly higher accuracy rate of 97% compared to the currently available models.

Authors utilized SVM classifier for classifying brain MR images into two tumor and non-tumor regions such as (Krishnakumar and Manivannan 2021) suggested an approach for the segmentation and classification of brain tumors in MR images. The preprocessing of MR images involves the removal of noise and the enhancement of contrast. The tumor region is segmented utilizing a rudimentary K-means algorithm. The utilization of a multi-kernel SVM is employed for capturing the features from the segmented tumor region and classifying them into various categories of brain tumors. According to the article, the proposed approach demonstrated a notable accuracy rate of 97.5% in the segmentation and classification of brain tumors, surpassing the performance of other existing techniques. The methodology employed in (Rao and Karunakara 2022) study utilized a kernel-based SVM algorithm for classifying MR images into two distinct categories: tumor and non-tumor regions. The utilization of a kernel function facilitates the transformation of the input data into a space of higher dimensionality. This enables the SVM to identify a hyperplane that effectively distinguishes between tumor and non-tumor regions. BRATS 2018, 2019, and 2020 datasets used to evaluate the performance of this method and the obtained results demonstrate a notable levels of accuracy, sensitivity, and specificity. Pitchai et al. (2023) utilized a two-dimensional U-ConvNet for extracting pertinent features from the MRI images. Subsequently, an SVM classification algorithm is employed to accurately classify the extracted features into either tumor or non-tumor regions. The methodology employed in this study is assessed utilizing the BraTS 2019 dataset. The findings demonstrate that the proposed approach attains notable levels of accuracy, sensitivity, and specificity. Diabetic Retinopathy (DR) is a primary cause of blindness, it is usually caused by insufficient blood flow to the retina, retinal vascular exudation, and intraocular hemorrhage. Traditional classification methods consider only two class classifiers to classify DR images. Hardas et al. (2022) considered 16 class classifications based on SVM to predict abnormalities in individuals or combinations based on the chosen class, where 77.3% accuracy was reported on the DIARETDB1 dataset.

3.4.1.2 Random forest (RF)

RF is a robust ensemble learning method that demonstrates significant efficacy in the context of image classification problems. DL methods, such as CNNs, have emerged as the prevailing approach for image classification due to their capacity to autonomously acquire hierarchical features. However, Random Forest remains a valuable alternative, particularly in situations where data is limited or when the interpretability of the model is of utmost importance. Abraham and Nair (2018) utilized SAE and random forest classifier for the diagnosis PCa lesions from MRI images. They explored ADASYN, SMOTE, and Weka-Resample to tackle the problem of class imbalance in PCa diagnosis.

Authors utilized a RF classifier for classifying fusion features in breast cancer images such as (Paul et al. 2018) improved RF classifier for detecting mitotic nuclei in the histopathological datasets of breast cancer and industrial dataset of dual-phase steel microstructures by reducing trees number based on the number of relevant and irrelevant features using the theoretical upper limit to enhance the classification accuracy. While, (Li et al. 2023) presented a Self-Attention RF (SARF) approach to demonstrate the significance of fusion features in breast cancer images. The pyramid GLCM was utilized to extract multi-scale fusion features. Additionally, the SARF incorporates adaptive refinement processing on these features, improving classification accuracy. Furthermore, the GridSearchCV technique optimizes the model's hyperparameters, effectively mitigating the constraints associated with manually selecting parameters. In order to demonstrate the efficacy of SARF, the validation was conducted on BreaKHis dataset.

To enhance the accuracy of medical image classification, (Alam et al. 2019) and (Dhivyaa et al. 2020) developed an approaches based on RF method. Alam et al. (2019) suggested a ranking method to rank the features, then RF was applied to highly ranked features for constructing the predictor. (Dhivyaa et al. 2020) introduced decision trees and random forest algorithms. The method under consideration has the capability to produce feature maps of high resolution, thereby aiding in the preservation of spatial details within the image. This method is evaluated over the ISIC 2017 and HAM10000 datasets and observed that it exhibits higher accuracy in comparison to the existing methods. Additionally, it demonstrates a high level of resilience towards artifacts and hair fibers that may be present in the skin images. Chowdhury et al. (2019) presented a Random Forest (RF) approach for identifying abnormal lesions in retina fundus images such as age-related macular degeneration and diabetic retinopathy at early stages. The segmentation process involved the utilization of K-means clustering to extract both bright and dark lesions. Subsequently, morphological operations are employed in order to eliminate regular structures. The RF classifier was employed with a total of 14 features divided into two categories: shape-based features and region-based features.

For classifying various brain diseases authors developed some approaches based on RF method. such as (Subudhi et al. 2020) proposed an automated approach to detect chronic stroke in MR images utilizing a diffusion-weighted image (DWI) sequence. According to Oxfordshire Community Stroke Project (OCSP) method, it involves brain stroke segmentation and classification into Partial Anterior Circulation Syndrome (PACS), Lacunar Syndrome (LACS), and Total Anterior Circulation Stroke (TACS). To enhance detection accuracy, the affected portion of the brain is segmented by utilizing the expectation maximization (EM) method, and the segmented region is then processed further based on Fractional Order Darwinian PSO (FODPSO) approach. The segmented lesions were then processed to extract various morphological and statistical features which were subsequently classified based on SVM and RF classifiers. Thayumanavan and Ramasamy (2021) applied various ML methods such as RFC, SVM, and DT for classifying the brain image into two different classes: normal and abnormal. These ML Methods were used along with a Median filter for optimizing the skull stripping in MRI images and DWT and HOG for feature extraction. This study reported that RF achieved better accuracy, strong robustness, and better convergence results, it also reduces the simulation time. Whereas, SVM provides competent sensitivity and clear margin separation between classes. To detect and classify brain tumors and their stages from MRI images (Gupta et al. 2022) proposed an ensemble method. initially, a modified InceptionResNetV2 pre-trained model is applied to detect tumors from MRI images. Then, InceptionResNetV2 is utilized with Random Forest Tree (RFT) for determining the cancer stage. However, to increase the dataset size Cyclic Generative Adversarial Networks (C-GAN) is utilized as the dataset size is small. (Jackins et al. 2021) utilized AI with Naive Bayes and RF classification algorithms for diagnosing many diseases like diabetes, heart disease, and cancer. The classification results based on RF reported better accuracy results than Naïve Bayes.

To classify the medical images of COVID-19 to different classes (Amini and Shalbaf 2022) and (Shaheed et al. 2023) utilized extracted features and RF classifier. (Amini and Shalbaf 2022) utilized texture features and RF for classifying the CT images of each patient in order to determine the severity of COVID-19. The extracted features are classified based on four major classes (normal, mild, moderate, and severe). To classify chest X-rays images as either positive or negative for COVID-19, (Shaheed et al. 2023) proposed a hybrid feature and RF classifier. Initially, the Gaussian filter and logarithmic operator were applied to enhance contrast, reduce noise, and achieve image smoothing. Then, a hybrid-feature extraction technique of CNN and GLCM was utilized for extracting features from chest X-ray images. Finally, an RF classifier is utilized to classify the images into two categories: COVID-19 positive and COVID-19 negative. The methodology employed in this study is assessed utilizing the COVIDx dataset. The findings indicate that the proposed approach attains notable levels of accuracy, sensitivity, and specificity. In order to categorize medical images into several disease groups, (Babenko et al. 2023) introduced an approach that employs an RF classifier, an ensemble learning algorithm comprised of numerous decision trees. The Random Forest classifier can process large datasets and identify intricate connections between characteristics and classes.

3.4.1.3 K-nearest neighbor (KNN)

KNN method is considered one of the simplest ML methods. The model only comprises the training data, whereby it learns from the whole training set. During prediction, it outputs the class that has the highest occurrence among the 'k' closest neighbors, determined by a certain distance measure. The methodology presented in the (Bhavani and Jiji 2018) study involves the utilization of image registration techniques to achieve alignment of varicose ulcer images. Subsequently, feature extraction methods are applied to extract color features, shape features, and texture features followed by the implementation of a KNN classifier to classify the images into distinct stages of wound progression based on the extracted features. The methodology employed in this study involves the assessment of a dataset comprising images of varicose ulcers. The findings of this study demonstrate that the proposed methodology achieves a notable level of accuracy, sensitivity, and specificity.

The authors in (Srinivas and Sasibhushana Rao 2019); (Saikia et al. 2022); and (Sejuti and Islam 2023) studies presented a hybrid CNN-KNN approaches for classifying the medical images. (Srinivas and Sasibhushana Rao 2019) introduced a hybrid approach, denoted as CNN-KNN, for the classification of brain tumors in MRIs. The CNN model is widely recognized for its ability to extract features, which are subsequently utilized in conjunction with a KNN classifier to make predictions regarding classes. The classification experiments are conducted on a selection of images from the BraTS 2015 and BraTS 2017 datasets and demonstrates a performance level of 96.25% accuracy, surpassing other metrics such as error rate, F-1 score, sensitivity, and specificity. To classify lung cancer into four distinct categories: Small-cell-carcinoma, Adenocarcinoma, Squamous-cell-carcinoma, and Large cell-carcinoma, (Saikia et al. 2022) developed an approach that integrates four advanced deep CNNs: VGG-16, VGG-19, Inception-V3, and Mobilenet-V2, with the KNN classifier. The research experiment comprises four distinct stages, specifically image pre-processing, feature generation, feature extraction, and classification. The VGG-16 model, when combined with the KNN, has demonstrated the highest validation accuracy in comparison to other models to classify lung nodules. To classify CT images into covid-19 or non-covid19, (Sejuti and Islam 2023) utilized a hybrid CNN–KNN approach with fivefold cross-validation. Initially, some preprocessing techniques were applied such as contrast enhancement, median filtering, data supplementation, and picture scaling. Then, all data is split for training and testing and fivefold cross-validation ensures dataset generalization and prevents network overfitting. Subsequently, the 23-layer CNN architecture includes four convolutional layers, four max-pooling layers, and two fully connected layers, which are utilized for feature extraction. The features are captured from the fourth convolutional layer of CNN model. Finally, the classification of these features is performed using the KNN classifier instead of softmax, aiming to enhance accuracy. Xing and Bei (2019) developed an improved KNN approach based on cluster denoising and density cropping techniques; and conducts a comparative analysis with the conventional KNN. The authors employed clustering to conduct denoising processing, thereby enhancing the classification efficiency through accelerated search speed of KNN, while simultaneously preserving the classification accuracy. The experimental findings demonstrate that the presented approach effectively enhanced the classification efficiency of KNN classifier when handling large datasets.

To classify the medical images of COVID-19 to different classes (Shaban et al. 2020); (Arslan and Arslan 2021); and (Hamed et al. 2021) developed an approaches based on KNN classifier. To increase the classification performance of COVID-19 images (Shaban et al. 2020) introduced the hybrid feature selection Methodology (HFSM) utilized to capture the features from the input CT images and selecting the significant features for classification purposes. Then, the KNN classifier is enhanced by adding strong heuristics in selecting the neighbors of the tested element to address the conventional KNN trapping issue. Arslan and Arslan (2021) presented a COVID-19 detection approach using KNN classifier and CpG island features. CpG island was used to extract robust and discriminative features from human coronavirus genome sequences. Then, KNN is utilized for classifying SARS-CoV-2 sequences. To enhance the performance of KNN classifier 19 distance metrics are performed in five categories. Hamed et al. (2021) implemented a KNN variant (KNNV) model to classify COVID-19. To enhance KNN performance rough set theory (RST) techniques are utilized to deal both incompleteness and heterogeneity and also obtain an ideal K value for every patient to be classified. Furthermore, an accurate distance calculation of Euclidean and Mahalanobis was employed.

To unify feature extraction and KNN classification procedure, (Zhuang et al. 2020) suggested a deep kNN approach that formulates every training sample and its K nearest neighbors belonging to the same class during learning the feature extractor. To classify cancer cells from CT scan lung images, (Vijila Rani and Joseph Jawhar 2022) developed an approach utilizing Optimization Techniques and a Hybrid (KNN-SVM) Classifier. The Advanced Surface Normal Overlap (ASNO) technique gives high accuracy to the hybrid classification method. Then, Grey Wolf Optimized & Whale Optimization Algorithm—SVM (GWO & WOA-SVM) method gives a clear optimized value for the KNN-SVM Classifier.

3.4.1.4 Artificial neural network (ANN)

ANNs is a class of ML models designed to mimic the structural and functional characteristics of human brain. ANNs are very potent ML models that are extensively used medical image classification. The transmission and processing of information occur via a series of interconnected nodes known as neurons. The input layer, located at the highest level of the network, is responsible for receiving and processing the input data. The output layer, located at the lowest level of the neural network, generates the ultimate outcome, which, in the context of image classification, corresponds to a probability distribution including all possible classes. The hidden layers, located between the input and output layers, perform intermediate computations. (Korkmaz, S. A., & Binol, 2018) utilized ANN and RF classifiers for classifying stomach cancer images. To minimize the high dimension of image features generated by the Local Binary Patterns (LBP) and Histogram of Oriented Gradients (HOG), different dimensionality reduction methods have been used such as SNE, MDS, LLE, LDA, t-SNE, and Laplacian Eigenmaps. The obtained results showed that LBP_MDS_ANN and LBP_LLE_ANN methods reported better accuracy results than other methods.

To classify the extracted features from various medical image origins, the authors (Shaukat et al. 2019); (Tumpa and Kabir 2021); and (Manoharan et al. 2022) introduced an approaches based on ANN classifier. Shaukat et al. (2019) proposed a hybrid feature set and ANN for lung-nodule classification. Initially, optimum thresholding along with multiscale dot augmentation filtering was used for lung image segmentation. Then, Feature vectors are created based on intensity, texture features, and shape (2D and 3D) features from lung nodule candidates and utilized by the ANN classifier to minimize false positives (FP). To improve the classification and detection of melanoma skin cancer in terms of accuracy, (Tumpa and Kabir 2021) developed an ANN-based hybrid texture features approach. Following segmentation utilizing Otsu's global thresholding technique, the lesion's various features are captured based on ABCD, GLCM, and LBP techniques. The ANN applies target values of the input features to classify if a lesion is normal or malignant. An improved ANN approach was suggested by (Manoharan et al. 2022) to enhance the classification accuracy of lung diseases. Gaussian filter was used as a preprocessing step. To capture features from CT images discrete Fourier transform and the Burg auto-regression are used. PCA method utilized for feature reduction.

For chronic kidney disease detection and classification, (Ma et al. 2020a, b) introduced a Heterogeneous Modified ANN (HMANN) approach. The HMANN eliminates the noise and supports the segmentation procedure for better detection of kidney stone location, SVM, ANN, and Multilayer perceptron classifiers are tested. To classify normal and malignant nasopharyngeal carcinoma (NPC) tumors from endoscopy images, (Mazin Abed Mohammed, 2020) proposed an ANN-based classification approach. various features were tested to provide a suitable balance between correct positive and right negative rates. Biswas and Islam (2021) proposed a brain tumor classification approach based on k-means and ANN. Initially, resizing, sharpening filtering, and contrast enhancement techniques are used in the preprocessing stage. Then, to segment the preprocessed image K-means method was utilized. After that, from the generated clustered images 2D- DWT is employed for extracting the features and PCA reduces the extracted features. Finally, ANN was used for tumor classification.

To enhance the classification performance of breast cancer, (Rahman et al. 2021) developed an ANN with the Taguchi approach. to determine the hidden layer parameter of the ANN the Taguchi technique was utilized to tackle the issue of overfitting and improve the classification accuracy for the training, testing, and validation. (Vijh et al. 2023) proposed an approach that integrates a bio-inspired algorithm with ANNs for conducting multilevel image thresholding segmentation. A bio-inspired algorithm, whose specific type is not indicated in the search findings, is employed to enhance the optimization of the threshold search procedure. The utilization of ANNs is used for classifying segmented regions into distinct categories. The proposed approach is specifically tailored for the analysis of histopathology images, which play a significant role in diagnosing and treating diverse medical conditions.

3.4.1.5 Bayesian

Bayesian approaches may be effectively used in the context of image classification tasks to effectively capture and represent uncertainty, hence enabling the generation of probabilistic predictions. BNNs are well recognized as a prominent Bayesian approach for image classification. The extension of standard neural networks is achieved by BNN, which adopts a novel approach of considering network weights as probability distributions instead of fixed values. In a BNN, it is customary for each weight parameter to possess an associated probability distribution, often a Gaussian distribution (also known as a normal distribution). During the training process, BNNs acquire knowledge about the most suitable weights to optimize their performance. Additionally, BNNs also evaluate the level of uncertainty associated with these weights. For classifying histopathological images of colorectal cancer, (Rączkowski et al. 2019) developed an accurate, reliable, and active approach based on B-CNN. The learning process is accelerated by utilizing a variational dropout-based entropy measure. Finally, The B-CNN was used to generate segmentation-based spatial statistics and segment whole-slide images of colorectal tissue. For the classification of breast cancer and diabetic retinopathy, (Alroobaea et al. 2020) suggested a fully Bayesian-based approach. Flexible finite mixture (FMM) models with Gaussian, bounded, generalized, and bounded generalized Gaussian bases are investigated. by concentrating on more adaptable statistical models described by the bounded generalized Gaussian mixture model, to address the challenge of accurate data classification. Additionally, the issue of over or under-fitting is addressed by accounting for the uncertainty via incorporating prior knowledge of the model's parameters.

To enhance the classification performance, (Kwon et al. 2020) and (Ekong et al. 2022) developed an approaches based on BNNs. (Kwon et al. 2020) presented a Bayesian neural network to quantify the uncertainty of predicted outcomes for classification by decomposing a predictive uncertainty measure into aleatoric and epistemic. The output's underlying distribution and numerical stability are used to express the intrinsic variability. (Ekong et al. 2022) presented an approach that integrates the Bayesian algorithm with depth-wise separable convolutions for precise classification and prediction of brain tumor MRI images. The depth-wise CNN architecture enables efficient computation through the application of individual convolutions to each input channel to minimize the number of parameters and computational complexity. The incorporation of uncertainty estimation within the Bayesian framework holds significant value in the context of medical image classification tasks.

To address the issue of Uncertainty Quantification (UQ) during skin cancer image classification, (Abdar et al. 2021) used three UQ methods: Monte Carlo (MC) dropout, Ensemble MC (EMC) dropout, and Deep Ensemble (DE). Further, a hybrid dynamic Bayesian-DL approach is proposed to deal with the rest of the uncertainty using Three-Way Decision (TWD) theory. This approach allows using different UQ and deep neural network methods in various classification stages to prevent wrong skin cancer classification. To tackle the memory constraint issue associated with Whole-slide image (WSI) classification, (Yu et al. 2023) introduced a Bayesian Collaborative Learning (BCL) approach. The proposed approach incorporates an auxiliary patch classifier into the target multiple instance learning (MIL) classifier. This allows for collaborative learning of the feature encoder and MIL aggregator while mitigating the memory bottleneck problem. The cooperative learning is designed within a unified Bayesian probabilistic framework, and an iterative EM algorithm is developed to estimate the optimal parameters in a principled manner.

3.4.1.6 Naïve bayes

This method is a supervised learning based method that utilizes the principles of the Bayes theorem to deal with classification tasks. The Naïve Bayes is a straightforward and probabilistic method for classification, often used in text classification tasks involving datasets with a large number of dimensions. The authors in (Zaw et al. 2019); (Balaji et al. 2020); (Ibrahem Alhayali et al. 2020); (Xiong et al. 2021); and (Mansour et al. 2022) studies aims at enhancing the classification performance. To classify a brain tumor from MRI, (Zaw et al. 2019) introduced a Naïve Bayes approach. multi methods such as morphological operations, pixel subtraction, maximum entropy threshold, and statistical features extraction are used to enhance the classification performance of Naïve Bayes. (Balaji et al. 2020) developed a modified graph cut method and probabilistic Naïve Bayes classifier for skin lesion segmentation and classification. The graph cut reduces the running time and processing resources. Feature extraction of the segmented regions improves the classification process. (Ibrahem Alhayali et al. 2020) used Hoffeding tree and naïve Bayes to enhance breast cancer classification accuracy. The Hoeffding trees select the optimal splitting features and the issue of high dimensionality was addressed based on naïve Bayes. However, to improve the performance of cancer classification, (Xiong et al. 2021) used Cost-Sensitive Naive Bayes (CSNB) as a stacking ensemble learner. To remove the unwanted features fast correlation-based feature selection (FCBF) method is utilized. The stacking ensemble used different learners such as a Library-SVM, KNN, DT, and RF. The presented CSNB shows effectiveness and robustness for classification. Feature Correlated Naïve Bayes (FCNB) was introduced by (Mansour et al. 2022) to improve and speed up the classification process of COVID-19 patients. Initially, FCNB performs pre-processing step which includes: ((2) Feature Clustering for groping the selected features into several clusters known as Master Features (MFs) each of which has a collection of similar features. (3) Master Feature Weighting for weighting value to every MF based on an optimal weight calculation method. Finally, modified weighted Naïve Bayes is utilized for classifying COVID-19 patients. FCNB reported improved traditional weighted NB performance, reduced time, and required less MF.

Distance Biased Naïve Bayes (DBNB) was presented by (Shaban et al. 2021) for classifying COVID-19-infected patients. Advanced PSO (APSO) feature selection method contains two phases; the initial selection phase selects the most informative features. The final selection phase improved the selection process Binary-PSO is utilized. Furthermore, the classification combines the evidence of statistically Weighted Naïve Bayes and Distance Reinforcement. (Nayak and Kengeri Anjanappa 2023) presented a hybrid naive-Bayes classifier approach for classifying MRI brain images with an objective to enhance the accuracy of classification through the utilization of an efficient and expeditious method that can identify a limited set of optimal parameters. The proposed model undergoes a series of four distinct steps, namely image preprocessing, feature extraction, and feature reduction followed by the application of a naive Bayes classifier.

3.4.2 Deep learning

The use of DL, specifically CNNs, has shown remarkable achievements in the realm of image classification. This may be due to its inherent capacity to autonomously acquire hierarchical features from images, hence endowing it with substantial efficacy for diverse applications such as MIA, object recognition, and many others.

Some studies hybridized CNN and RNN for classifying medical images such as (Liang et al. 2018) for blood cell image detection and classification. Initially, RNN was used to solve the issue of using the long-term dependence correlation between image key features and labels. Then, the transfer learning Xception-Long Short-Term Memory (LSTM) technique is pre trained on the ImageNet dataset that is applied to the CNN Phase. The experimental results of this approach reported high classification performance. CNN (DenseNet) and RNN (LSTM) approach was developed by (Yao et al. 2019) for classifying breast cancer histology images under four categories (normal tissues, benign lesions, in situ carcinomas, and invasive carcinomas). CNN and RNN are utilized for capturing features from the images. The special perceptron attention mechanism obtained from Natural Language Processing (NLP) is then utilized for unifying the extracted features. Finally, the model's performance and robustness are enhanced by integrating the latest switchable normalizing technique with the targeted dropout regularization technique. Yan et al. (2020) introduced an approach involving the integration of CNN and RNN to classify histopathological images related to breast cancer. This approach combines the strengths of convolutional and recurrent neural networks to leverage the enhanced multi-level feature representation of histopathological image patches. This integration ensures the preservation of both short-term and long-term spatial correlations between patches. (Suganyadevi, Renukadevi et al. 2022) evaluated the strengths and weaknesses of the current early DR diagnostic methods. Early DR detection approaches are classified as either deep learning, traditional image processing, or conventional machine learning.

Authors applied DL to improve classification performance of medical images, especially CNN classification performance by increasing the training data (Mikołajczyk and Grochowski 2018) utilized a data augmentation based on the integration of color modification and affine image transformations such as rotating, cropping, zooming, and histogram-based methods, ending at Style Transfer and GAN. For melanoma image classification, (Brinker et al. 2019) utilized a pre-trained ResNet50 CNN approach. To enhance the performance of CNN approach: (1) Replace the CNN learning rate of all layers with the differential learning rates. (2) Reducing the learning rate based on the cosine function. (3) Random gradient descent with a restart for avoiding local minimums. (Jeyaraj and Samuel Nadar 2019) utilized a partitioned CNN to classify oral images into benign and cancerous. They examined the HSI features of oral cancer based on the stochastic neighbor embedding technique. The experimental results reported improvement in classification accuracy. (Gour et al. 2020) introduced a residual learning-based 152-layered CNN model, named as ResHist to classify breast histopathological images into benign and malignant classes based on the extracted rich and discriminative features. Further, the data augmentation method is applied to the ResHist for improving the classification performance. The evaluations demonstrate the superiority of ResHist over AlexNet, VGG16, VGG19, GoogleNet, Inception-v3, ResNet50, and ResNet152 models. To Improve classification performance and gain higher accuracy on the MNIST medical dataset, (Dash et al. 2023) combined the CNN model with illumination normalization methods. Conventional networks mobileNetV2 and squeezeNet were utilized in this study, along with Tantrigg and Isotropic lighting enhancement. Extensive trials and results illustrated that this approach is effective at handling images with inconsistent illumination and providing improved classification accuracy.

Authors proposed DCNN for classifying medical images such as (Mateen et al. 2018) suggested a VGG-19 DNN approach for classifying diabetic retinopathy from fundus images. In this approach, the VGGNet used for extracting the features, while PCA and SVD utilized to decrease the dimensionality of the large-scale retinal fundus images. Moreover, GMM with an adjustable learning rate applied for region segmentation. (Oktay et al. 2018) developed an attention U-Net utilizing a bottom-up attention gate, enabling exact emphasis on a specified area that emphasizes valuable characteristics without incurring additional computing costs or model parameters. The proposed approach is both versatile and adaptable, making it readily applicable to tasks involving image classification and regression. To address the problem of overfitting, (Kaur and Gandhi 2020) conducted a comparative analysis of various pre-trained deep CNN approaches with transfer learning for evaluating the effectiveness of such models in automating the classification of MR brain images into different categories, namely normal vs abnormal and glioma vs meningioma vs pituitary. Efficient classification was performed using a range of pre-trained deep CNNs, including Alexnet, Resnet50, GoogLeNet, VGG-16, Resnet101, VGG-19, Inceptionv3, and InceptionResNetV2. To classify breast cancer tissue images into benign, malignant, and eight subtypes, (Jiang et al. 2019a, b) developed a CNN with a small SE-ResNet module. To reduce training parameters and overfitting, they used a small SE-ResNet based on both residual unit and Squeeze-Excitation block. (Yadav and Jadhav 2019) conducted a comparative study on three CNN-based models to classify pneumonia over chest X-ray images: (1) linear SVM classifier, (2) transfer learning of VGG16 and InceptionV3, and (3) training capsule network. The data augmentation method is applied to all three models to enhance the classification performance. CNN-based transfer learning is the superior model among other models, the capsule network is better than the SVM classifier.

Some studies applied CNN networks for chest X-ray images to achieve a precise classification of COVID-19 such as (Reshi et al. 2021) suggested CNN based approach to tackle several challenges including a limited dataset size, imbalanced class distribution, and image quality concerns. The dataset underwent various preprocessing stages employing different techniques to address these challenges. The preprocessing steps conducted in this study encompassed dataset balancing, image analysis by medical experts, and data augmentation. Furthermore, (Madaan et al. 2021) presented a two-phase X-ray image classification method named XCOVNet for detecting early cases of COVID-19 using a CNN method. The XCOV-Net algorithm is designed to identify instances of COVID-19 infection in patient chest X-ray images. Basically, this detection process is carried out in two distinct phases: the initial stage involves preprocessing a dataset comprising 392 chest X-ray images with an equal distribution of COVID-19 positive and negative cases. Whereas in the second phase, CNN is trained and fine-tuned with the objective to attain a patient classification accuracy of 98.44%. In (Chakraborty et al. 2022) utilized the transfer learning approach on the pre-trained VGG-19 architecture for classifying chest X-ray images into three groups: COVID-19, Pneumonia, and Healthy cases. MongoDB is utilized as a database for the purpose of storing the original image along with its corresponding category. The analysis was conducted on a publicly available dataset consisting of 3797 X-ray images. This dataset includes images of individuals affected by COVID-19 (1184 images), pneumonia (1294 images), and individuals who are healthy (1319 images).

An approach proposed by (Alnaggar et al. 2022a, b) for classifying tumor regions from brain 3D MRI, which involves the utilization of Hyper-Layer CNNs (HL-CNNs) and Hyper-Heuristic Extreme Learning Machines (HH-ELM). Initially, the MRI images are preprocessed by utilizing some denoising and image enhancement techniques. Following that, the HL-CNN is proposed as a method for extracting features. The optimal features are extracted utilizing a correlation-based selection method via HL-CNN validation to mitigate the irrelevant features within the system. The HH-ELM is implemented as a classification model for categorizing tumor images and distinguishing between various types of tumors. Furthermore, (Yang et al. 2022a, b) introduced a training strategy called Deep Tree Training (DTT) with a two-stage selective ensemble of CNN branches. This strategy utilized DTT to simultaneously train a set of networks that are built from the hidden layers of CNN in a hierarchical structure. It aims at addressing the issue of vanishing gradients by providing additional gradients for the hidden layers of the CNN. Consequently, it allows for the acquisition of base classifiers on the middle-level features, while minimizing the computational load required for an ensemble solution. Furthermore, the base learners of CNN are merged to the optimal classifier using a two-stage selective ensemble method that considers both the accuracy and diversity criteria.

Authors in (Alnaggar, Jagadale, & Narayan, 2022) proposed an approach for identifying tumors by applying a hybrid segmentation algorithm and a novel CNN-based optimized classifier using the Chimp Optimization Algorithm (COA). This approach encompasses several stages, namely preprocessing, segmentation, feature extraction, and classification. The initial step of pre-processing involves the application of a wavelet filter for denoising purposes, as well as the utilization of adaptive histogram equalization techniques for image enhancement. The Boosted Crossbred Random Forests (BCRF) algorithm is applied for segmenting the regions of interest to accurately detect and classify the presence of skull, tumor, and lesions. The features are obtained using the GLCM and Gabor filter. The tumor classification is ultimately accomplished through the utilization of the COA-CNN. Rocha et al. (2023) introduced a novel image classification approach called Hybrid CNN Ensemble (HCNNE), which integrates image features derived from CNN and LBP. The aforementioned features are subsequently employed to construct a collection of multiple classifiers. In this study, the authors utilize the Euclidean distance between LBP feature vectors of each training class and the confidence values obtained from classifying CNN features using SVM. These distances are then combined to form the input for a multi-layer perceptron classifier. Finally, these attributes are additionally employed as inputs for other classifiers in order to construct the ultimate voting ensemble.

3.4.3 Other classification approaches

There are many methods other than ML and DL that are utilized for medical image classification tasks such as clustering and optimization techniques. An automated approach proposed by (Vivona et al. 2018) for indirect immunofluorescence image classification, where the existing Centromeres on cells are grouped using the K-means clustering method. This approach reported minimum intra- and inter-laboratory variability, and significant enhancement in the performance of positive/negative classification. To address the issue of inconsistencies in labeling in retinal image classification, (Luo et al. 2020) introduced a Self-supervised Fuzzy Clustering Network (SFCN) approach. SFCN consists of three major parts. Initially, stacked convolutional layers capture the features from the retinal images. Then, the reconstruction module for the learned feature is used to ensure that feature extraction can accurately represent the learned feature. Finally, fuzzy self-supervision enables the training director for the entire network. The SFCN uses the results of fuzzy clustering for supervising the network and constrain the network to output the probability of each retinal image belonging to each cluster.

For the diagnosis of COVID-19, (Mittal et al. 2021) introduced a clustering method using K-means. The improved gravitational search algorithm (IGSA) is applied for selecting the optimal clusters, which are further utilized for classifying COVID-19 and non-CoVID-19. To address the issues such as small variant size of cells and heavy noise in histology image, (Saturi & Parvataneni 2022) proposed an optimization-based superpixel-clustering approach. Initially, the normalization method is employed for improving the input image quality. Then, segmenting nuclei, and non-nuclei cells based on superpixel-clustering with PSO and Grey Wolf Optimizer (GWO). Further, Local Direction Ternary Pattern (LDTP), perimeter, solidity, circularity, GLCM, eccentricity, and color auto-correlogram are utilized for extracting the features. Finally, SVM is utilized for classifying the histology breast images as cancer or benign.

3.5 Medical image visualization

Image visualization is an extensively studied domain that enables healthcare professionals to rapidly examine medical images, establish a diagnosis, and formulate an appropriate treatment strategy. Medical images may be visualized using many approaches, such as volume rendering, geometry rendering, or integrating both. Furthermore, each approach may use distinct visual factors, such as color categorization, an object's dimensions, or motion.

3.5.1 Direct volume rendering (DVR)

Direct Volume Rendering refers to a collection of methods used to visually represent three-dimensional image data. Volume rendering necessitates the mapping of each sample value to both opacity and color. The process of mapping is achieved by the use of a transfer function. Ma et al. (2020a, b) implemented a DVR technique that enables users to specify the desired feature via transfer functions. Volume rendering is used to depict regions of uncertainty in the final volume representation via distortion. Weiss and Navab (2021) presented the Deep Direct Volume Rendering (DeepDVR) approach, which utilizes implicit mappings to extract features and do classification. An extension of the DVR that enables the incorporation of deep neural networks within the DVR technique. Research has shown that the presented approach successfully facilitates comprehensive learning of scientific volume rendering tasks from desired images. (Jung 2021) proposed a clustering-based ray analysis approach to create an automated transfer function for medical DVR. The authors validate the efficacy of their approach on many medical datasets. Kim et al. (2021) introduced a technique for colorizing transfer functions using convolutional neural networks (CNN). This approach enables the automated generation of direct volume rendering images (DVRI) that closely resemble a given target image. The authors validated the efficacy of their approach on many medical datasets. Weiss and Westermann (2022) introduced a differentiable volume rendering method, which allows for the differentiation of all continuous factors involved in the volume rendering process. The authors illustrate the use of their method for automated perspective selection and the optimization of a transfer function using rendered images of a specified volume. Xu et al. (2023a, b) presented cinematic volume rendering (CVR) enhancements into the open-source visualization toolkit (vtk.js) to improve WebXR compatibility. The authors provide a concise overview of two experiments that were undertaken to assess the efficiency in terms of quality and speed for different CVR approaches on diverse medical data.

3.5.2 Indirect volume rendering (IVR)

Indirect Volume Rendering relies on a representation of the surface mesh that is not directly connected to the volume being rendered. This mesh is created either by obtaining an isosurface from the source volume data or by transforming a segmentation result. Li et al. (2010) implemented two integrated approaches Improved Volume Rendering Optical approach (IVROM) and Improved Translucent Volume Rendering Method (ITVRM) for addressing the depicting medical translucent volumes. IVROM takes into account the influences of volumetric shadows, as well as direct and indirect scattering. Park et al. (2011) proposed an approach for segmenting and visualizing anatomical features from three-dimensional medical images. Initially, the segmentation utilizing the level set approach was accomplished using a surface evolution framework based on the geometric variation principle. A hybrid volume rendering method that integrates indirect and direct volume rendering techniques is used to visualize the segmented deformable features. Li et al. (2013) developed a system that utilizes the open source VTK for the purpose of visualizing CT, MR, PET, and SPECT scans. The system employs many visualization techniques, including indirect volume rendering (IVR). Gillmann et al. (2018) presented a visual method to represent the level of uncertainty in extracted surfaces. In this case, semitransparent isosurfaces are used to represent probable locations of surface points.

3.5.3 Glyph-based Visualization

Glyph-based visualization approaches include the encoding of various attributes using a collection of visual elements. Glyphs are a significant means of graphically representing tensors or cases in which multiple features need to be represented visually. Zhang et al. (2016) presented a systematic approach for comparing two distinct sets of tensors. They used combined glyphs to create a visual depiction for comparison. While this permits the comparison of various datasets, it may provide a first assessment of the variability in DTI datasets. Abbasloo et al. (2016) presented an approach that simplifies the examination of tensor covariance via visual analysis at various degrees of detail. Initially, slice views and direct volume rendering were utilized to represent significant changes in the covariance structure and areas with substantial overall variation. Subsequently, authors provide instruments for interactive investigation, enabling the examination of various forms of variability, such as shape or orientation. Ultimately, this enables the analyst to concentrate on certain areas of the field, using tensor glyph animations and overlays that visually represent confidence intervals at those exact spots in an intelligible manner. Gerrits et al. (2019) proposed a method for representing unclear second-order symmetric tensors in 2D and 3D using glyphs. The proposed approach may expand many categories of current glyphs for symmetric tensors to include uncertainty, thereby serving as a potential basis for the development of uncertain tensor glyphs design. As an example, we use the well-recognized superquadric glyphs and demonstrate that the uncertainty visualization meets all of their design restrictions. Ristovski et al. (2019) presented an interactive visual analysis approach that incorporates uncertainty information. The system utilizes coordinated views of volume visualizations, a slice-based visualization of 2D uncertainty glyphs with a color-coding scheme that is perceptually consistent, and parameter analysis plots. The glyphs use a unique system of color-coding to emphasize the level of confidence in the region that needs to be treated. The approach is used for simulating Radiofrequency ablation procedures for the treatment of liver tumors and evaluating their effectiveness via well-conducted user research. Khawatmi et al. (2022) presented ShapoGraphy, a web-based program that enables users to generate personalized and user-friendly visual representations of bio-medical data. ShapoGraphy's ability to effortlessly integrate and arrange diverse form glyphs, as well as include hand-drawn shapes, offers a unique chance to conveniently assess distinct designs.

3.5.4 Geometry methods

Geometric methods are essential for representing medical images visualization. These techniques are used to analyze and display several forms of medical imaging modalities, including CT, MRI, and ultrasound. Geometric modeling using computational geometry is used for the manipulation of three-dimensional objects in computer graphics and image processing. Some crucial components of these methods include Virtual Reality (VR), Three-Dimensional Modeling, Fractal Geometry, and Differential Geometry. To this end, (Chen et al. 2018) proposed a Virtual Reality method to generate a visual representation of human parts in a 3D environment based on medical image data. The method enables clinicians to do illness analysis, surgical training, or surgical teaching in a virtual setting. In (Çalışkan 2017) acquiring 3D images from 2D CT slices is achieved using volumetric data received via medical imaging techniques including MR and CT. The study explores many methods of 3D imaging, including fractal geometry visualization, fundamental data types, transformation into primary graphical elements, and image techniques. Van Nguyen et al. (2020) explore the use of geometric modeling to visualize medical imaging databases. The researchers investigate the use of geometric modeling methods, including surface and volume imaging approaches, preprocessing, and segmentation.

4 Comparative analysis

This section presents details comparative analysis to evaluate the performance of some prominent MIA approaches in terms of accuracy, precision, Recall, F-measure, mIoU, and specificity performed over five publically available datasets: ISIC 2018, CVC-Clinic, 2018 DSB, DRIVE, and EM. Nine approaches from the literature: U-Net (Ronneberger et al. 2015), U-Net +  + (Zhou et al. 2018), Attention U-Net (Oktay et al. 2018), ResUNet +  + (Jha et al. 2019), DoubleU-Net (Jha et al. 2020), TransUNet (Chen et al. 2021), DS-TransUNet (Lin et al. 2022), FANet (Tomar et al. 2022), and DCSAU-Net (Xu et al. 2023a, b). Tables 6 and 7 illisturates the performance analysis of MIA approaches. The bold in Tables 6, 7 indicates the optimal results of the datasets (ISIC 2018, CVC-Clinic, 2018 DSB, DRIVE, and EM) in terms of Accuracy, Precision, Recall, F-Measure, Specificity, and mIoU.

Analyzing Fig. 11, in the ISIC 2018 dataset, DCSAU-Net stands out as a top performer, demonstrating superior accuracy, recall, and F-measure. DoubleU-Net impressively achieves a precision of 94.59%, maintaining a well-balanced performance. DS-TransUNet excels in mIoU with a notable score of 85.23%. However, TransUNet exhibits relatively lower precision. In the CVC-Clinic dataset, DS-TransUNet consistently attains the highest scores across recall (95.00%), F-measure (94.22%), and mIoU (89.39%). FANet excels in accuracy, boasting an impressive 99.16%, while DoubleU-Net showcases a higher precision of 95.92%. In contrast, U-Net and ResUNet +  + display comparatively lower performance in specific metrics. In the 2018 DSB dataset, DS-TransUNet emerges as a standout performer in recall, F-measure, and mIoU. FANet excels in accuracy, and DoubleU-Net excels in precision. However, it's worth noting that DoubleU-Net records the lowest values for recall. Careful consideration of these results is essential when selecting a segmentation model, taking into account the specific needs of the task at hand.

Fig. 11
figure 11

Comparative analysis in terms of (accuracy, precision, recall, F-measure, mIoU)

Examining Fig. 12, in the DRIVE dataset, FANet consistently maintains its superior performance across all metrics except specificity, highlighting its robust effectiveness in semantic segmentation tasks. Notably, U-Net excels in specificity, achieving an impressive 98.27, surpassing other approaches. In the EM dataset, FANet consistently outperforms alternative methods across all evaluated metrics, firmly establishing itself as the leading segmentation model. Although U-Net serves as a baseline, its lack of specific metric values hinders a thorough assessment of its strengths and weaknesses.

Fig. 12
figure 12

Comparative analysis in terms of (F-measure, precision, recall, specificity, mIoU)

5 Existing medical datasets

Throughout the literature, various image collections are applied as training and testing data for building medical image processing approaches. It is essential to differentiate between specific medical image data sets and general ones. Some patients with a very specific disease which makes it challenging to collect medical images that can then be analyzed and labeled by physicians. It is a time-consuming procedure that frequently entails high economic expenses, which makes the datasets collections private and requires cost money to view. Some researchers choose to work with a healthcare organization and collect their images personally from patients; however, this is a long and slow process for researchers to accomplish. To tackle this problem, free-use collections, challenges, and contests are available online. Table 8 shows various image modalities including CT, MRI, PET, US, and X-ray. The modalities are categorized according to the body organs from which the images were obtained (brain, breast, lungs, skin, eye, and others).

5.1 BRATS

The BRATS (Brain Tumour Segmentation) datasets are a collection of brain MRI images extensively utilized to build and test brain tumor segmentation approaches. The datasets comprise scans from several institutions. This dataset was produced by Medical Image Computing and Computer-Assisted Intervention Society (MICCAI). The BRATS datasets are freely accessible and important for developing and accessing brain tumor segmentation approaches. There are different versions of BRATS datasets as shown in Fig. 13. Most used datasets are as follows:

  • BRATS 2015 dataset consists of MRI images of 220 high-grade and 54 low-grade gliomas (Menze et al. 2015).

  • BRATS2017 dataset includes 285 brain tumor MRI images in the T1, T1ce, T2, and Flair MRI modalities. Additionally, the collection offers complete brain tumor masks with ED, ET, and NET/NCR labels (Bakas et al. 2017).

  • BraTS 2018 dataset offers multi-dimensional 3D brain MRIs with four different MRI modalities per case (T1, T1c, T2, and FLAIR) and ground truth brain tumor segmentations annotated by medical professionals (Bakas et al. 2018).

Fig. 13
figure 13

Some brain images collected from Brats datasets

5.2 IBSR

The IBSR (Internet Brain Segmentation Repository) dataset contains brain MRI images that are freely available to the public. It was produced by the Centre for Morphometric Analysis at Massachusetts General Hospital and includes 3D T1-weighted MRI images of 18 normal cases as well as traditional segmentations of different brain components, including the white matter, grey matter, and cerebrospinal fluid. The dataset is often utilized for building and validating automated brain segmentation models.

5.3 OASIS

The OASIS (Open Access Series of Imaging Studies) dataset is a publicly available dataset of brain MRI. It was created by the Washington University Alzheimer's Disease Research Center and contains over 1,500 brain MRIs of normal and Alzheimer's disease images including T1-weighted and T2-weighted. The OASIS dataset has been widely utilized in developing various approaches for Alzheimer's disease analysis and automated brain segmentation (Marcus et al. 2007).

5.4 BreakHis

BreakHis (Breast Cancer Histopathological) dataset contains 9,109 microscopic images of breast tumor tissue taken from 82 individuals at four various magnifications (40X, 100X, 200X, and 400X) for image classification tasks. Currently, it has 5,429 malignant samples and 2,480 benign samples (700 × 460 pixels, 3-channel RGB, 8-bit depth per channel, PNG format). It was developed by Dr. Gabriel N. Oliveira and his associates at Brazil's Federal University of Paraná (Spanhol et al. 2016). Figure 14 shows an image of breast cancer histopathological.

Fig. 14
figure 14

Breast cancer histopathological image

5.5 ISIC 2019

The ISIC 2019 (International Skin Imaging Collaboration) dataset contains around 25,331 high-resolution images of skin lesions obtained from different medical resources worldwide, including benign and malignant melanoma (Combalia et al. 2019). It's important to note that it already contains ISIC 2017 (Codella et al. 2018) and ISIC 2018 (Tschandl et al. 2018) datasets. The ISIC dataset is one of the biggest publicly accessible datasets of its kind and has been utilized in developing and evaluating computer-aided diagnostic (CAD) systems for the early identification of skin cancer. Figure 15 shows some sample images of skin disease.

Fig. 15
figure 15

Samples from ISIC 2019 dataset

5.6 PH2

The PH2 (Pigmented Lesions and Humans) dataset is a set of dermoscopic images of pigmented skin lesions, produced by researchers from Polytechnic University of Catalonia in Spain for skin lesion analysis and diagnosis. It consists of 200 dermoscopic images, including 80 images of melanocytic lesions and 120 images of non-melanocytic lesions. PH2 dataset was utilized for building and evaluating ML models for segmentation, feature extraction, and classification tasks (Mendonca et al. 2013). Figure 16 shows some sample images of skin disease PH2 dataset.

Fig. 16
figure 16

Samples from the PH2 dataset

5.7 LIDC-IDRI

The Lung Image Database Consortium image collection (LIDC-IDRI) comprises over 1,000 CT scan images of the chest and is freely accessible for the development, training, and evaluation of CAD approaches for lung cancer detection and diagnosis. established by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and actively supported by the Food and Drug Administration (FDA) (Armato et al. 2011). Figure 17 shows some sample images of lung from LIDC-IDRI dataset.

Fig. 17
figure 17

Samples from the LIDC-IDRI dataset

5.8 DRIVE

The DRIVE (Digital Retinal Images for Vessel Extraction) dataset is a publicly accessible collection of retinal fundus images focusing on the identification and segmentation of blood vessels in the retina. The DRIVE includes 40 color fundus images, which are 565 × 584-pixel resolution grayscale retinal images. The blood vessels in the retina are manually segmented in the images of the DRIVE dataset, which may be used to train and test CAD approaches for automatic vessel segmentation (Korotkova et al. 2005). Figure 18 shows some sample images of eye from DRIVE dataset.

Fig. 18
figure 18

Samples from the DRIVE dataset

5.9 STARE

The STARE (Structured Analysis of the Retina) dataset is a freely accessible dataset of retinal images utilized for developing and evaluating various CAD models of retinal diseases. It was produced by the Ophthalmic Biophysics Center at the Bascom Palmer Eye Institute. The STARE dataset consists of 20 sets of vessel segmentation data and 20 fundus images, each with a resolution of 700 × 605 pixels. Each picture in the collection also includes ground truth data that identifies any anomalies like microaneurysms or hemorrhages and describes their kind and location (Hoover et al. 2000).

5.10 MURA

The MURA (musculoskeletal radiographs) dataset is a big publicly available dataset produced by researchers at Stanford University that is frequently utilized for the development of ML models to detect anomalies in musculoskeletal images. It includes 40,561 images of the chest, upper extremities, and lower extremities, each labeled to show whether the image is normal or abnormal and, if abnormal, what kind of abnormality is shown (Rajpurkar et al. 2017). Figure 19 shows some sample images of bone from MURA dataset.

Fig. 19
figure 19

Samples from the MURA dataset

5.11 CVC-ClinicDB

CVC-ClinicDB dataset is a publicly available dataset consisting of 612 images, each with a resolution of 384 × 288 pixels. These images are obtained from 31 colonoscopy sequences. It is used for the segmentation of medical images, specifically in the identification of polyps in colonoscopy detection (Jha et al. 2019). Figure 20 shows some sample images of polyp from CVC-ClinicDB dataset.

Fig. 20
figure 20

Samples from the CVC-ClinicDB dataset

5.12 2018 Data Science Bowl (2018 DSB)

The collection of 2018 DSB dataset has a substantial quantity of images depicting segmented nuclei. The images were obtained under diverse situations and exhibit variations in cell type, magnification, and imaging technology (brightfield or fluorescence). The dataset is specifically created to test the method's capacity to segment across these variances (Z. Zhou et al. 2018).

6 Evaluation metrics

This section briefly presents and analyzes the common utilized evaluation metrics and their related formulas for evaluating medical image preprocessing, segmentation, feature extracting, and classification approaches, which vary depending on the particular task and application.

Peak Signal-to-Noise Ratio (PSNR): evaluates the accuracy of compression or denoising methods that are used to minimize the size of medical images without compromising their clinical information. PSNR computes the ratio between the maximum possible value of a signal and the amount of noise in the signal. PSNR is calculated using the following formula:

$$PSNR = 10 {log}_{10}\left(\frac{{M}^{2}}{MSE}\right)$$
(6)

where \(M\) represents the maximum pixel value of an image, \(MSE\) is the Mean Squared Error between the original and processed image, and the logarithm is base 10. The higher the PSNR value, the better the quality of the processed image.

Mean Squared Error (MSE): evaluates the accuracy of image processing approaches that are utilized to improve the quality or clarity of medical images, such as those obtained from CT, MRI, or ultrasound scans. MSE computes the average squared difference between the pixel values of the reconstructed image and the original image, with lower values indicating better performance. To calculate the MSE first define two images: A (the original image) and B (the reconstructed image). Then MES is calculated based on the following formula:

$$MSE = \left(\frac{1}{N}\right) * {\Sigma [A(i,j)-B(i,j)]}^{2}$$
(7)

where \(A(i,j)\) and \(B(i,j)\) are the values of pixel at location \((i,j)\) in images \(A\) and \(B\), respectively, and \(N\) represents the total number of pixels in an image.

Structural Similarity Index (SSIM): it is utilized for evaluating image enhancement and restoration methods. This metric calculates the structural similarity between the original and processed images, taking into account luminance, contrast, and structural information. SSIM can be computed using the following formula:

$$SSIM(x, y) = \frac{({2\mu }_{x}{\mu }_{y}+{C}_{1})({2\sigma }_{xy}+{C}_{2})}{({\mu x}^{2}+{\mu y}^{2}+ {C}_{1})({\sigma x}^{2}+ {\sigma y}^{2}+{C}_{2})}$$
(8)

where \(x\) and \(y\) are the two images being compared, \({\mu }_{x}\) and \({\mu }_{y}\) are their respective means, \({\sigma x}^{2}\) and \({\sigma y}^{2}\) are their respective variances, σxy is their covariance, and \({C}_{1}\) and \({C}_{2}\) are two variables used to stabilize the formula when the denominator is close to zero.

Mean Absolute Error (MAE): it calculates the average magnitude of errors between two images in enhancement, restoration, and segmentation approaches. The MAE is calculated by taking the absolute difference between the corresponding pixels in two images, summing these differences over all pixels, and then dividing by the total number of pixels. Formally, the equation for MAE is:

$$MAE = \left(\frac{1}{N}\right) * \Sigma |A(i,j)-B(i,j)|$$
(9)

where, N represents the total number of pixels in the images, and \(A(i,j)\) and \(B(i,j)\) are the values of pixel at location \((i,j)\) in images \(A\) and \(B\), respectively.

Accuracy: it is a metric used in MIA to evaluate the performance of a classification and segmentation approaches. It calculates the percentage of correctly classified or segmented pixels in an image in comparison to ground truth data. For calculating the accuracy, four terms need to be defined: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). In the context of image processing, these terms represent the following:

  • TP: number of pixels, which are correctly classified or segmented as positive (belong to a particular class).

  • TN: number of pixels, which are correctly classified or segmented as negative (not belong to that class).

  • FP: number of pixels, which are classified or segmented incorrectly as positive (belong to the class when they actually do not).

  • FN: number of pixels, which are classified or segmented incorrectly as negative (not belong to the class when they actually do).

The accuracy formula can be calculated as:

$$Accuracy =\frac{TP+TN}{TP+TN+FP+FN}$$
(10)

Jaccard index: it is also called Jaccard similarity coefficient or Intersection Over Union (IoU). This metric is utilized to evaluate the performance of image segmentation approaches in MIA by measuring the size of intersection of two sets pixels divided by the size of their union. The Jaccard index is computed based on the following formula:

$$J(A, B)=\frac{|A\cap B|}{|A\cup B|}$$
(11)

where \(A\) and \(B\) are the two sets of pixels being compared, \(|A\cap B|\) is the number of pixels that are common to both sets and \(|A\cup B|\) is the number of pixels that are in either set.

Dice coefficient: This metric is similar to IoU used to compare the similarity between two sets of pixels such as the predicted and ground truth segmentation masks, and calcuates the overlap between the ground truth and the predicted segmentation. Basically, the Dice coefficient is defined as twice the intersection between the two sets of pixels divided by the sum of the pixels in each set.

$$Dice\, coefficient=2*\frac{|A\cap B|}{|A|+|B|}$$
(12)

where \(A\) and \(B\) are the two sets of pixels being compared, \(|A|\) and \(|B|\) is the number of pixels in each set, and \(|A\cap B|\) is the number of pixels that are common to both sets.

Precision: a metric utilized for evaluating the performance of a classification or segmentation approaches in MIA. It measures the ratio of true positive detected or classified objects to the total number of positive detected or classified objects. the precision formula is calculated as:

$$Precision= \frac{TP}{TP+FP}$$
(13)

Recall: also known as sensitivity, a metric utilized for evaluating the performance of a classification or segmentation approaches in MIA. It measures the ratio of true positive detected or classified objects to the total number of pixels that actually belong to the class being detected or classified. the Recall Formula is calculated as:

$$Recall= \frac{TP}{TP+FN}$$
(14)

Specificity: a metric utilized for evaluating the performance of a classification or segmentation approaches in MIA. It measures the ratio of true negative detected or classified objects to the total number of negative identified or classified objects. the Specificity formula is calculated as:

$$\text{Specificity}= \frac{TN}{TN+FP}$$
(15)

F1 measure: a metric utilized for evaluating the performance of a classification or segmentation approach in MIA tasks. It provides a balance between precision and recall and particularly useful for imbalanced classes. It is a harmonic mean of precision and recall and can be calculated as:

$$F1 measure=2 *\frac{precision*recall}{precision+recall}$$
(16)

7 Quantitative analysis for MIA publications

This section intends to quantitatively analyze the most recent MIA publications in the literature. Generally, the total number of reviewed MIA articles in this taxonomy is 331 articles, which are analyzed based on publisher, publication year, main category, sub-categories, image origin, and evaluation metrics.

7.1 MIA publication analysis based on publisher

This subsection distributes the reviewed articles based on the publisher. Figure 21 shows the reviewed articles organized into six categories, each representing a particular publisher: Elsevier, IEEE, MDPI, Springer, Wiley, and others. The "others" category represents a collective count for all publishers not directly listed in the predefined categories. From Fig. 21, we can conclude that the Springer, IEEE, and Elsevier publishers have a highest publication rate 59, 58, and 55 articles respectively, whereas the MDPI publisher got the 16 articles which indicates the lowest publication value. However, it's worth noting that MDPI is known for open-access publishing, and its presence has been growing in recent years.

Fig. 21
figure 21

Publication distribution over publisher

7.2 MIA publication analysis based on the year

This subsection is analyzing the MIA publication based on the publication year, where the publication year is divided into seven distinct classes starting from 2018 to 2023, excluding the first class which includes the publications prior to 2018 and is denoted as " < 2018". Figure 22 depicts the distribution of research articles of MIA, where they are organized based on the publication year. Figure 22 clearly illustrates that a total of 20 articles pertaining to MIA approaches were published prior to the year 2018. In recent years, there is a remarkable increase rate in the publication for MIA. In the year 2019 and 2020, there was a notable surge in the quantity of reviewed articles with 60 articles. It can be observed that there was a consistent growth in the number of published articles from 2018 to 2022, with the exception of a decline in 2022 when only 53 articles were reviewed. In contrast, the total count of reviewed articles in the year 2023 amounts to 29. In light of the incomplete available data for the current year, which is still in progress at the time of this taxonomy.

Fig. 22
figure 22

Publication distribution over publication year

7.3 MIA publication based on category

This section analyzes the publications of reviewed MIA approaches based on category. In general, the total number of reviewed MIA articles in this taxonomy is 331 articles out of which 37.2% (123) articles are related to segmentation approaches, 26% (86) articles are related to classification approaches, 16.3% (54) articles are related to feature extraction approaches, 15.1% (50) articles are related to preprocessing approaches, and 5.4% (18) articles are related to visualization approaches, this is clearly depicted in Fig. 23.

Fig. 23
figure 23

Publication distribution over selected categories

7.4 MIA publication based on sub-category

In this subsection, we analyze the publications of the reviewed MIA approaches based on sub-categories. Generally, the total number of reviewed articles in the preprocessing stage is 50 which is distributed into three sub-categories: Color Space Transformation, De-noising, and Enhancement have 11, 24, and 15 reviewed articles respectively. The total number of reviewed articles in the segmentation stage is 123 which is distributed into five sub-categories: Thresholding, Region, Edge, Clustering, and DL (supervised and unsupervised) have 8, 20, 8, 21, and 66 reviewed articles respectively. The total number of reviewed articles in the Feature Extraction stage is 54 which is distributed into three sub-categories: Color, Texture and Shape, and DL have 5, 14, and 29 reviewed articles respectively. The total number of reviewed articles in the Classification stage is 86 which is distributed into three sub-categories: ML, DL, and others have 61, 21, and 4 reviewed articles respectively. The total number of reviewed articles in the visualization stage is 18 which is distributed into four sub-categories: DVR, IVR, Giyph, and Geometry have 6, 4, 5, and 3 reviewed articles respectively.Table 11 shows the MIA articles analysis based on the sub-categories in details.

7.5 MIA publication based on image origin

This subsection of taxonomy distributed the reviewed articles based on the origin of images used within those articles. Figure 24 illustrates several categories representing different areas of focus, with each category denoting a specific organ. Additionally, there is a category labeled "others," which encompasses articles that do not directly fall into the predefined categories, suggesting a diverse range of medical image origins. From Fig. 24 we can conclude that the Brain organ has the highest number of reviewed articles, totaling 75. This indicates a significant emphasis on brain-related imagery, possibly reflecting the importance of neurological research and its impact on various fields, whereas the skin and eye organs have the lowest reviewed articles, totaling 22.

Fig. 24
figure 24

Publication distribution over selected origin

7.6 MIA publication based on evaluation metrics

This subsection provides the quantitative analysis on MIA reviewed articles based on the evaluation metrics for the most significant MIA stages segmentation and classification approaches.

7.6.1 MIA publication based on evaluation metrics for segmentation approaches

From Fig. 25 and Table 9 among the analyzed metrics, the accuracy has the highest number of articles (53), highlighting its broad use in evaluating the general effectiveness of medical image segmentation models or algorithms. Dice and Sensitivity closely follow with 50 and 39 articles respectively. Metrics such as Jaccard (24 articles), Specificity (30 articles), and Precision (16 articles) have been used to measure various elements of model performance. whereas, F-measure (12 articles), IoU (10 articles), HD (9 articles), and PT (7 articles) have the lowest-reviewed articles.

Fig. 25
figure 25

Publication distribution over used evaluation metrics for segmentation approaches

7.6.2 MIA Publication based on evaluation metrics for classification approaches

From Fig. 26 and Table 10 it is noted that accuracy metric has been utilized the most as it has got the highest value (58) articles, highlighting its broad use in evaluating the general effectiveness of medical image classification models. Sensitivity is found to be closely in the second rank as it scored the value (44) articles. Whereas, F-measure has the lowest value (21) articles.

Fig. 26
figure 26

Publication distribution over used evaluation metrics for classification approaches

8 Findings and discussions

This section provides a detailed discussion on the overall findings observed from the qualitative analysis, comparative analysis, and quantitative analysis on MIA approaches in this study.

Qualitativ analysis: Qualitative analysis of existing approaches in MIA across major stages (preprocessing, segmentation, feature extraction, and classification) reveals notable observations. Examining current preprocessing approaches, as illustrated in Table 2, some studies attempt to enhance the PSNR value, such as the works in (Chen et al. 2019); (Benhassine et al. 2021); (Chen et al. 2019); (Guo et al. 2022); (Rodrigues et al. 2019); (Elhoseny and Shankar 2019) and others. While, others focus on noise elimination, as demonstrated by (Gupta et al. 2018a, b); (Anoop and Bipin 2019); (Gupta and Ahmad 2018). Additionally, efforts are directed towards improving image quality, with contributions from studies such as (Chervyakov et al. 2020); (Habeeb 2021); (Naimi 2022); (Chen, Zhou, et al. 2021a, b, c, d); (Ruikar et al. 2019); (Fan et al. 2023). Further endeavors concentrate on enhancing image contrast, as seen in works by (Yang et al. 2018); (Panse and Gupta 2021); (Malik et al. 2022); (Agarwal and Mahajan 2018); (Kuo and Wu 2019); (Pashaei and Pashaei 2023). Concurrently, there are initiatives to address time complexity issues, including (Elaiyaraja et al. 2022); (Chanu and Singh 2018); (Sagar et al. 2020); (Balasamy and Shamia 2021); (Luo et al. 2022).

The qualitative analysis on the reviewed segmentation approaches, outlined in Table 3, emphasize the superior efficiency of DL approaches in the segmentation stage compared to traditional and statistical methods in MIA. The segmentation approaches attempt to tackle prevalent issues in the MIA, including refining segmentation accuracy, mitigating under and over-segmentation, addressing time complexity, and enhancing model robustness. Numerous studies attempt to enhance the segmentation accuracy such as (Sharma et al. 2019); (Raja et al. 2018); (Khan et al. 2023); (Braiki et al. 2020); (Kumar et al. 2021); (Meng et al. 2020); (Rehman et al. 2020); (Cao et al. 2023); (Dayananda et al. 2022); (Yao et al. 2023) and others. Yet, due to the inherent complexity of anatomical structures in certain body parts, segmentation tasks pose difficulties, as seen in works such as (Zeebaree et al. 2019); (Bafna et al. 2018); (Huang et al. 2019); (Gu et al. 2019); (Ma et al. 2018);). In contrast, alternative approaches focus on alleviating time complexity, as evidenced by (Feng et al. 2023); (Tamilmani & Sivakumari 2019); (Wang et al 2019); (Wang et al., 2019); (CABEZA-GIL et al. 2022); (Azad et al. 2022) and others. However, challenges persist, including issues of under and over-segmentation, managing large datasets or high-resolution images, and addressing interpretability concerns in MIA. In light of these challenges, it is apparent that medical image segmentation requires continued attention. Researchers are strongly encouraged to address the aforementioned issues in their future works, fostering advancements that overcome the complexities associated with segmentation tasks in the ever-evolving landscape of Medical Image Analysis.

In contrast, upon the qualitative analysis on reviewed feature extraction approaches, it becomes evident, as seen in Table 4, that various studies have employed diverse methods to extract informative features. This diversity aims at enhancing classification accuracy, as demonstrated by studies such as (Homeyer et al. 2013); (Mall et al. 2019); (Karthikamani and Rajaguru 2022); (Madhu and Kumar, 2022); (Venkatachalam et al. 2021); (Aswiga et al. 2021); (D. Chakraborty et al. 2023). Moreover, some studies, including (Laouamer 2022); (Nitish et al. 2020); (Sari and Gunduz-Demir 2019); (Mohite and Gonde 2022), emphasize the importance of these approaches in enhancing model robustness. Simultaneously, others, such as (Ahmed 2020); (Sucharitha and Senapati 2019); (Barshooi and Amirkhani 2022), focus on simplifying the classification process.

Conversely, based on the qualitative analysis on the reviewed classification approaches, it is obvious from Table 5 that the effectiveness of ML and DL methods is superior to traditional and statistical methods during the classification phase. The classification approaches seek to address the prevalent challenges encountered in MIA including but not limited to accuracy, classification performance, model robustness, model complexity, processing time, and transparency. Some studies attempt to enhance the classification accuracy such as (Thayumanavan and Ramasamy 2021); (Shaheed et al. 2023); (Sejuti and Islam 2023); (Korkmaz and Binol, 2018); (Yu et al. 2023); (Saturi & Parvataneni 2022). Similarly, others focus on improving classification performance, as demonstrated by ((Deepak and Ameer 2021); (Pitchai et al. 2023); (Bhavani and Jiji 2018); (Hamed et al. 2021); (Vijila Rani and Joseph Jawhar 2022); (Rączkowski et al. 2019); (Ibrahem Alhayali et al. 2020); (Nayak & Kengeri Anjanappa 2023) and more. Certain works diligently address the challenge of model robustness, as seen in (Mazin Abed Mohammed, 2020), (Rahman et al. 2021), and others.

Despite these commendable efforts, a critical observation emerges regarding the persistent challenges. Existing classification approaches grapple with high complexity, exemplified by works such as (Vijh et al. 2023) and (Gour et al. 2020). Additionally, they are often time-consuming, as demonstrated by studies including (Jackins et al. 2021) and (Alroobaea et al. 2020), and lack transparency, as evidenced by (Prakash & Saradha 2021) and (R. K. Gupta et al. 2022). In light of these challenges, researchers are strongly advised to conscientiously consider these issues in their future works, fostering advancements that address the complexity, processing time, and interpretability aspects of classification approaches in the dynamic landscape of Medical Image Analysis (Tables 6, 7, 8, 9, 10, 11).

Table 6 Performance analysis
Table 7 Performance analysis
Table 8 Summary of Commonly used datasets for medical image analysis
Table 9 Comparative analysis on the utilized evaluation metrics in prominent segmentation approaches
Table 10 Analysis of the evaluation metrics utilized in prominent classification approaches
Table 11 Publication distribution over selected sub-categories

In ML and DL, each method bears specific benefits and limitations. The optimal selection of a method depends on the distinctive characteristics of the medical image dataset, task requirements, and the available resources. A detailed breakdown of the limitations associated with each ML and DL based MIA method summarized in Table 12.

Table 12 Limitations of ML and DL based MIA methods

Comparative analysis: Comparative analysis on nine prominent approaches from the literature over five publicly available datasets reveals some significant observations. In the ISIC 2018 dataset, DCSAU-Net demonstrates superior accuracy, recall, and F-measure and DoubleU-Net impressively achieves a precision of 94.59%, maintaining a well-balanced performance. While, DS-TransUNet excels in mIoU with a notable score of 85.23%. conversely, TransUNet exhibits relatively lower precision. In the CVC-Clinic dataset, DS-TransUNet consistently attains the highest scores across recall (95.00%), F-measure (94.22%), and mIoU (89.39%). FANet excels in accuracy, boasting an impressive 99.16%, while DoubleU-Net showcases a higher precision of 95.92%. In contrast, U-Net and ResUNet +  + display comparatively lower performance in specific metrics.

In the 2018 DSB dataset, DS-TransUNet distinguishes itself as a leading performer in recall, F-measure, and mIoU metrics and the FANet demonstrates exceptional accuracy, while DoubleU-Net excels in precision. However, It's noteworthy that DoubleU-Net records the lowest recall values. In DRIVE dataset, FANet is efficient in semantic segmentation tasks and consistently superior in all metrics, except specificity, where as U-Net excels in specificity. FANet is also consistently efficient in EM dataset, outperforms other methods across all evaluated metrics, establishing itself as the leading segmentation model. Although U-Net serves as a baseline, its lack of specific metric values limits a thorough assessment of its strengths and weaknesses. Selecting an appropriate model requires careful consideration of these nuanced results, tailored to the specific requirements of the task at hand and the medical image dataset.

Quantitative Analysis: Some observations are highlighted regarding the publications of reviewed article related to MIA based on the quantitative analysis. However, if we take into account the publications of each main category such as preprocessing, segmentation, feature extraction, and classification stages based on the timeline, the most productive years of publications was from 2019 to 2022. On the other side, we observed from Fig. 22 that the MIA based approaches had given the most attention to the segmentation and classification stages. The segmentation-based approaches have scored a higher rate of publications among the MIA categories, followed by the classification-based approaches. From Table 11 it can be concluded that the DL approaches found to be the most productive sub-category among the segmentation approaches, followed by clustering approaches. On the contrary, the researchers had given the traditional based segmentation approaches less attention. Whereas, the ML approaches are found to be the most productive sub-category among the classification approaches, followed by DL approaches. Finally, we observed from Fig. 23 the medical image origin utilized by MIA approaches; brain images have given more attention. Whereas other medical image origins such as skin, eye, heart, prostate, and liver have not got more attention. This gives an indication to the researchers to investigates ML and DL for such origins (skin, eye, heart, prostate, and liver).

9 Open issues and challenges

Obviously, based on the findings of this study, issues such as data quality problem, large datasets or high-resolution, variability in anatomical structures, lack of transparency, the selection of relevant features, Interpretability, class imbalance, subjectivity in labeling, and overfitting has been observed. This section briefly highlights the aforesaid issues and challenges observed in the existing reviewed MIA approaches based on the comparative analysis, which can definitely help in further enhancements to the performance of MIA approaches.

Data quality problem: In MIA, the data quality problem can arise for several reasons such as limitations in the imaging equipment (e.g., lower resolution scanners), poor image acquisition, data compression, or other factors that lead to a reduction in the level of detail in the images.

  • High levels of noise: The presence of noise in medical images may have a substantial negative impact on their quality, hence posing difficulties in the extraction of precise information and perhaps affecting the diagnosis accuracy during the diagnosis and analysis process. Several methods were developed to tackle this issue. It is crucial to acknowledge that the selection of a methodology is contingent upon the particular imaging modality, the characteristics of the noise, the intended analysis result, and the balance between noise mitigation and the retention of significant image details. The hybrid approaches have attained the optimal outcomes in noise reduction for MIA.

  • Low resolution: low-resolution in medical images present major challenges in achieving accurate and reliable diagnoses. Several techniques were developed to tackle the problem of poor resolution in MIA. The primary determinant for enhancing the poor resolution in medical images is in the selection of appropriate methodologies to develop approaches, taking into consideration the balance between computing complexity, available data, and the desired enhancement in image resolution.

  • Low contrast: The issue of poor contrast in MIA pertains to scenarios when the disparities in intensity or grayscale values among distinct structures or tissues in the medical images provide challenges in discerning them, hence leading to impeding precise diagnosis and segmentation. Enhancement techniques are considered viable approaches for addressing the problem of poor contrast in MIA. The main objective of such approaches enhancing the detectability of structures within the image by augmenting the contrast, all the while maintaining the integrity of the original details.

Scalability: The problem of large datasets or high-dimensional feature spaces in MIA pertains to the difficulties encountered when dealing with vast quantities of data or images with very high levels of detail. Medical imaging technologies such as MRI, CT scans, and histopathology provide highly detailed images, leading to substantial amounts of data that might pose challenges in terms of effective management, processing, and analysis.

Model Complexity: Model complexity pertains to the number and kind of parameters, characteristics, and interactions utilized in a particulate approach to learn knowledge from the data. Model complexity includes computational Intensity, computational resources, processing time, and computational cost.

Variability in anatomical structures: The diversity of anatomical structures makes it difficult to segment or recognize them in images. Typically, some people may have less defined organ borders than others, which may raise a segmentation challenge. The accuracy of MIA methods may be impacted by variations in anatomical features. Different patients may have different standards for what constitutes a "normal" size or form for a given organ, which might lead to an incorrect diagnosis.

Limited Interpretability: It is a critical challenge in MIA. It provides local explanations that delineate a given image's specific traits and characteristics that the approach deems significant for making predictions. In contrast, global explanations seek to discover the shared features that the approach considers when linking images to a certain class..

Selection of relevant features: It is a critical challenge in MIA. It involves determining which aspects or features of an image are most informative for specific analysis or research purposes. DNNs have the capability of learning relevant features directly from the data. However, understanding and interpreting the features learned by deep models can be challenging.

Class imbalance: The class imbalance issue is a common challenge in medical image segmentation and classification approaches. The term "class imbalance" pertains to scenarios in which the distribution of several categories within a dataset exhibits a pronounced skew, indicating that one or more classes have notably fewer instances in comparison to others. Class imbalance is a problem that may severely affect the accuracy and scalability of a segmentation and classification approaches. The hybrid approaches of ML and DL can provide a balanced and effective solution.

Subjectivity in labeling: The subjectivity in labeling issue in medical image classification approaches refers to the challenge posed by the inherent variability in how different medical professionals or annotators interpret and label images. This variability can arise due to the complexity of medical conditions, uncertainty, and variability. As a result, assigning correct labels is a challenging task. Therefore, addressing the subjectivity in labeling is highly desirable to ensure the accuracy and reliability of medical image classification approaches. MIA approaches that use ML methods such as SVMs and random forests RFs are more reliable and efficient.

Overfitting: The problem of overfitting in MIA arises in many ML and data analysis applications. Overfitting occurs when a model achieves efficient results on the training data but cannot transfer those results to new unseen data. Overfitting must be avoided in MIA, therefore finding the ideal balance between model complexity, data, and regularization strategies is crucial.

10 Conclusion

This study provides a comprehensive review, taxonomy and analysis for recently proposed existing AI based medical image processing approaches in healthcare. The taxonomy covered the major image processing stages (preprocessing, segmentation, feature extraction, classification and visualization) classifying the existing AI based MIA approaches for each stage based on the employed method and qualitatively analyzed based on image origin, objective, method, dataset, and evaluation metrics to illustrate their strengths and weaknesses. Further, comparative analysis is conducted to evaluate the efficiency of AI based MIA approaches over five publically available datasets: ISIC 2018, CVC-Clinic, 2018 DSB, DRIVE, and EM in terms of accuracy, precision, Recall, F-measure, mIoU, and specificity. The existing poplar public dataset and evaluation metrics are highlighted and analyzed. Furthermore, a quantitative analysis is presented based on publisher, publication year, categories, sub-categories, image origin, and evaluation metrics to illustrate their trends. Finally, open research issues and challenges are highlighted to significantly aid researchers in developing efficient approaches that satisfy current medical practitioners needs. The findings of this research indicate that the ML, DL, hybrid, and optimization approaches provided the best performance at all stages of the MIA system. Based on our research findings, the key recommendation is the exploration and development of new, efficient MIA approaches. These approaches should leverage advanced AI techniques and hybrid methodologies that seamlessly integrate all significant aspects, mitigating the existing level of complexity. Pursuing such strategies presents a promising and viable option for the future of medical image analysis.