Multimodal biomedical image retrieval and indexing system using handcrafted with deep convolution neural network features

The advances in biomedical imaging equipment have produced a massive amount of medical images that are generated by the different modalities. Consequently, a huge volume of data has been produced and caused a complex and time-consuming retrieving process of the relevant cases. To resolve this issue, the Content-Based Biomedical Image Retrieval (CBMIR) system is applied to retrieve the related images from the earlier patients’ databases. However, the previous handcrafted features methods that applied the CBMIR model have shown poor performance in many multimodal databases. In this paper, we focus on designing CBMIR technique using Deep Learning (DL) models. We present a new Multimodal Biomedical Image Retrieval and Classification (M-BMIRC) technique for retrieving and classifying the biomedical images from huge databases. The proposed M-BMIRC model involves three dissimilar processes as following: feature extraction, similarity measurement, and classification. It uses an ensemble of handcrafted features from Zernike Moments (ZM) and deep features from Deep Convolutional Neural Networks (DCNN) for feature extraction process. Additionally, the Hausdorff Distance based similarity measure is employed to identify the resemblance between the queried image and the images that exist in the database. Moreover, the classification process gets executed on the retrieval images using the Probabilistic Neural Network (PNN) model, which allocates the class labels of the tested images. Finally, the experimental studies are conducted using two benchmark medical datasets and the results ensure the superior performance of the proposed model in terms of different measures include Average Precision Rate (APR), Average Recall Rate (ARR), F-score, accuracy, and Computation Time (CT).


Introduction
Recently, the progressive development of digital systems, multimedia, and memory space has produced a massive amount of images as well as multimedia contents. Medical and diagnostic works are highly beneficial from the advancements of digital storage and content processing. The clinics have diagnosed and examined the imaging services to generate a high quantity of clinical images' details. Hence, the deployment of significant clinical Image Retrieval (IR) model is essential for physicians to browse the huge datasets. There are many of computation and management methods for handling and automating the clinical images. An effective method named CBMIR, has been applied for diagnosing numerous diseases which is considered as productive management systems that managing a large quantity of information. Without the existence of these systems, the processing and filtering of data are highly complicated.
The CBMIR methods focus on extracting text details like tags that required manual annotations which give a low efficiency and require more time, manual intervention, and medical expertise. There is an extremely needs for CBMIR methods to be applied in automatic classification and image retrieval. They can support the medical Decision Support Systems (DSS), and clinical works to identify the related details from huge repositories. Additionally, the Content-Based Image Retrieval (CBIR) is defined as a computer vision model which is applied to explore the related images in massive databases. This exploration depends upon the image properties such as color, texture, and structure which is derived from the image. The working principle of CBIR system is based on the determined features (Liu et al. 2007). Initially, the images are depicted with respect to their features that presented in high dimensional feature space. Followed by saving the affinity between images in the database and then, the Query Image (QI) has been estimated in a feature space under the application of distance measures like Euclidean distance. Therefore, the depiction of image data is accomplished by the image characteristics and the affinity measure which are considered as very important metrics. Kumar et al. (2019) presented a biomedical image retrieval system which uses Zernike moments (ZMs) for extracting features from CT and MRI medical images. But the presented model does not include the deep learning (DL) models into execution and it does not consider the highly correlated visual properties between various classes. Besides, several research works have applied PNN for image classification processes (Afify et al. 2020;Bodyanskiy et al. 2020;Agostini et al. 2020).
In order to develop a CBMIR method, the image classification is a major challenge because of the highly correlated visual properties between various classes, which intended in mitigated retrieval functions. To overcome this problem, the advanced Machine Learning (ML) methods have been applied for generating better classification. In past decades, important developments were established in ML and Artificial Intelligence (AI) along with Deep Learning (DL) approach. The major principle behind DL is the analogous to the performance of human brain where data is computed by several layers of transformation. Hence, DL approaches have implied better performance in CBIR domains. Additionally, DL modules have major involvement in different clinical applications like Brain Tumor (BT) prediction, blood flow quantification and visualization, Diabetic Retinopathy (DR), and massive number of cancer prediction domains. Therefore, there are high demands for improving the CBMIR operations because of the rapid development of clinical imaging model. Moreover, the extensive dissemination of Picture Archiving and Communication Systems (PACS) in medical centers implied a progressive enhancement in the dimensions of medicinal clinical collections. Hence, managing the massive clinical databases required the deployment of efficient CBIR systems. Besides, managing the database, we required a particular CBMIR model which guides the physicians in making significant decisions regarding certain disease. By deriving the same images and histories, the physicians can make a critical decision regarding patient's disorders and diagnoses. For clinical images, the global feature extraction depends upon failure of generating compact feature representations as medically significant details which are localized in tiny regions of image.
This paper presents a new Multimodal Biomedical Image Retrieval and Classification (M-BMIRC) technique to retrieve and classify the biomedical images from huge database. The proposed M-BMIRC model performs the retrieval and classification processes using feature extraction, similarity measurement, and classification. The presented model uses an ensemble of handcrafted features from ZM and deep features from DCNN) for feature extraction process. In addition to, the Hausdorff Distance based similarity measure which is employed to identify the resemblance between the QI and the images that exist in the database. Furthermore, the classification process is implemented on the retrieval images using the Probabilistic Neural Network (PNN) model, which allocates the class labels for the test images. The reset of this paper is organized as following: Sect. 2 presents the literature review for all related areas of the research. Followed by Sect. 3 which provides the details of the proposed model. Then, the experimental validation is presented in Sect. 4. Finally, the conclusion of this work is given in Sect. 5.

Literature survey
Numerous CBMIR models have been proposed in the literature. In this section, we present brief reviews of the two existing CBMIR models named handcrafted features using classical models and deep features using DL models.

Handcrafted features
For clinical images, the global feature extraction systems have ineffective compact feature implications in the tiny regions of image (Kumar and Gopal 2014). A method depends upon Bag of Visual Words (BoVWs) under the application of SIFT features has been projected for brain Magnetic Resonance Image (MRI) to diagnose Alzheimer disease (Mizotin 2012). Laguerre Circular Harmonic Functions coefficients (LG-CHF) has been employed as feature vectors for retrieving the surpassed Scale Invariant Feature Transform (SIFT), as well as Speeded up Robust Features (SURF) characteristics that relied on retrieving systems. Additionally, a model for CBIR for skin lesion images has been projected in (Jiji and Raj 2015) which provides a newly approach that depending on shape, color, and texture features of the image under processing using the regression trees. Maximum specificity and sensitivity have been addressed for skin lesion images.
An effective clinical CBIR system has been projected as a solution for doctor to learn mammographic images using various features extraction methods, diverse metrics, and relevance feedback (Ponciano-Silva et al. 2013). Two CBIR technologies according to wavelet adaptation based on discriminate wavelet have been applied to classify a QI that estimating the optimal wavelet filter in (Quellec et al. 2012). A regression performance undergoes tuning used to enhance the retrieving performance by employing various wavelet 1 3 bases for QI. An essential improvement was attained in IR function. Moreover, a classification based supervised learning model applied to biomedical IR has been presented (Rahman et al. 2011). It has applied image extracting and affinity fusion depends upon multiclass Support Vector Machine (SVM) which is employed for predicting QI. Therefore, by reducing the imbalanced photographs, the search region is limited in massive databases for similarity estimation.

Deep learning-based models
Recently, an essential breakthrough in DL is performed in clinical applications which is categorized into 2 single modality-relied and multiple modalities-related methodologies of imaging. Initially, a single modality-relied technique and the two-phase CBMIR method have been employed for automated retrieval of radiographic images. Followed by the major class label that has been allocated under the application of CNN-relied features, and in second stage, the outlier images has been extracted from the detected class according to the low-level edge histogram features. Alternatively, CNN-relied model proposed in (Anthimopoulos et al. 2016) for classification of interstitial lung diseases (ILDs) patterns by filtration of ILD features from decided dataset. The traditional categorization of Restricted Boltzmann Machine (RBM)-relied approach developed to examining the lung CT scan by integrating both generative and discriminative representation learning (Van et al. 2016). A CNN-related automated classification of peri-fissural nodules (PFN) was introduced in (Ciompi et al. 2015) for performing the lung cancer scanning. Then, 2-stage multi-instance DL model has been projected for classifying diverse body organs (Yan et al. 2016). First, the CNN undergoes training on local patches used to isolate the differential and non-informative patches from the training samples. Second, the fine-tuned of filtered patches feed to the classification process.
A brief examination of DL in CAD introduced in (Shin et al. 2016). Hence, there are three major features like CNN structures, dataset scale, and the transfer learning of CNN. A fully automated 3D CNN model for detecting cerebral micro-bleeds (CMBs) from MRI (Dou et al. 2016). The CMBs are tiny hemorrhages (HM) blood vessels where the prediction offers a deep insight for cerebrovascular infections as well as cognitive dysfunctions. The effective CNN training technology has been projected by dynamic selection of negative instances at the training process which implies an optimal function in HM prediction inside a color fundus photographs. A multi-view convolutional network (ConvNets)-relied CAD model presented for predicting pulmonary lumps from lung computed tomography (CT) scan photographs (Setio et al. 2016). Since the multiple modalities-related models such as a DL-based technology for multiclass CBMIR resented in (Adnan et al. 2017) which classifies the multimodal clinical images. For example, the intermodal dataset which is composed of massive classes with modalities CT, MRI, fundus camera (Das et al. 2020), positron-emission tomography (PET), and optical projection tomography (OPT) has been applied for training the network.

The proposed model
The development processes of the proposed CBMIR model is depicted in Fig. 1. Firstly, the collection of handcrafted features was extracted from ZM and the deep features were extracted using the Deep Convolutional Neural Networks (DCNN) from the input image. Using the new QI as input, a Hausdorff Distance based similarity measure is employed to determine the related images and retrieve Fig. 1 The development processes of CBMIR model them. Then, the PNN method is executed on the retrieved images to determine the actual class labels of them.

Feature extraction
In this section, we explain the integration of Zernike Moments (ZMs) with DCNN to extract the deep features. Initially, the ZMs are Zernike polynomial related to the orthogonal moments (Gau et al. 2018). The input image is feed into a tedious orthogonal Zernike polynomial to accomplish the ZMs. The major benefit of the ZM is deriving features by filtering texture and shape attributes to be comprised a minimum amount of repeated data. It is obtained in a diverse order of moments to depict the images with insensible towards the outliers, and it is a rotation invariant.
Hence, the ZMs across a unit disk is processed by: where, Z pq implies the ZMs in conjunction with p order and q repetition for f (x, y), (pi) is 3.14, V pq is Zernike orthogonal based function as well as V * pq refers tedious conjugate of V pq which is estimated as: and R pq (r) denotes a radial polynomial which is determined as: The ZM functions for the 0th order approximation as well as external unit disk, which have been applied for enhancing the model performance. ZM coefficients f ZM are measured by assuming the minimum order of ZMs using Eq. (1). In order to extract the low order, the ZM features and the massive number of features have to be assumed in this process which refers to the metrics of p and q within 0 to 5, and the overall values of Zernike coefficient would be: by applying Eq. (5) where, Z 1 , Z 2 , Z 3 , Z 4 , … Z m means ZM coefficients and m defines the overall count of ZM coefficients. (1) where pmax means the higher value of p.
Then, the deep features from DCNN model are employed. Researchers have identified that as a human visual system to resolve the issues that related to the classification, prediction, and identification, with robust deployment of significant nervous systems. This process promotes the biological visual systems to establish the latest data processing models. Cells in cortex of human optical system are suspicious for tiny regions and can approve where a local spatial correlation between cells in an image.
The CNN structure applies two specific models: the local receptive field and the distributed weights. An activation measure of a convolution neurons is estimated by enhancing the local input with weight W , which has been distributed in an entire input space. The neurons which come under a similar layer share an identical weight. In general, CNN structure is composed of convolutional and pooling layers. Initially, convolution layer is embedded with a pooling layer, which reflects the features of tedious and simple cells in mammalian visual cortex [21]. The input data in CNN is a matrix with a 3D spatial infrastructure, where (H, W) , ( H ′ , W ′ ) and ( H ′′ , W ′′ ) imply the size of spatial dimension of input data, convolution kernel, and output data respectively. The counts of convolution kernel feature vectors are implied by D , and D ′′ that demonstrate the 3D data, x ∈ ℝ H×W×D , where x defines the input data, f implies a convolution filter, and y is the output data. The 1D signal x undergoes the convolution by filter f to estimate the signal y as provided below: where b d ′′ refers to a neuron offset and f i ′ j ′ d indicates a convolution kernel matrix of dth i � × j � .
Once the convolution task is completed, the pooling layer is applied. The typical employed pooling task is the max pooling. It is mainly applied to estimate the highest response of the feature channel in H � × W � region. A feature map is highly robust for data distortion and accomplishes a maximum invariance using pooling. Moreover, the pooling layer reduces the size of feature map and limits the processing overhead.
The classical CNN processing flow is developed with the data structure features for image classification tasks. Hence, training data is attained by using forward propagation to compute original output and the related actual real data tags competition. Stochastic Gradient Descent (SGD) has been applied for modifying the synapses and the attributes of a network structure, and various iterative trainings have been performed for developing the network system. The test data is projected as a network model. Hence, the final data is attained by feature extraction which are mapped with the original data, and the final outcomes are categorized using the competitive final model.

Similarity measurement
Feature extraction is defined as a significant step of CBMIR and the similarity estimation is an alternate key step in IR. Presently, the distance and the correlation measures are generally applied for measuring the similarity. When the distance among feature points of images are maximum, then the images are similar. Typically, the models can be Euclidean distance, Hausdorff distance, Manhattan distance, and EMD distance. Firstly, the Euclidean distance is a simple similarity estimation model (Xiaoming et al. 2018) that can be simple to learn. Additionally, it is sensible for image deformation. Secondly, the Manhattan distance depends upon the systems rotation of the coordinates while estimation; also the mapping can be applied instead of the coordinate rotation. Finally, the Hausdorff distance is not a corresponding to the distance from the various points, but it refers to the max-min distances. It comes under the fuzzy mapping among point sets that mimics the entire affinity of the clinical image's features set. In this study, the Hausdorff distances between two clinical images are estimated. In case of query clinical image set, X = {x 1 , x 2 , … , x N } as well as a specific image set Y = {y 1 , y 2 , … , y M } in clinical image database. Then, Eqs. ‖y − x‖ points x in the attained clinical image set X and following the above steps, the point x gains a corresponding distance, then selects the maximum lower distances. h(Y, X) is named a backward Hausdorff distance, and the calculation process is similar to h(X, Y) . Eventually, the highest point between h(X, Y) and h(Y, X) is selected as a Hausdorf distance of the queried clinical image and the specific image from the clinical image database.
From the above-defined distances, the Hausdorff distance between the queried clinical image and the special image from the database is determined. This distance is graded from minimum to maximum, and the images from a database correspond to the initial Hausdorff distances are selected for retrieving the simulation outcome.

Image classification
In this stage, every retrieved image is classified using PNN. Basically, the PNN is significantly applied for the ship discovery, noise categorization and image classification. It is defined as a nonlinear classifier and the parallel model depends upon Bayesian lower risk criterion (Liao et al. 2018). The sample that has to be found is x , and the posterior possibility P(S k |x) could be accomplished using PNN classifier. Therefore, when the possible densities of the classes are isolated, the training instances are learned, and the purpose of the identified class is determined. Consequently, the trained PNN has been applied for computing the identity of x . A general PNN classification method is composed to the input layer, pattern layer, summation layer and output layer. The Architecture design of PNN is illustrated in Fig. 2. Initially, the input layer neuron have been applied to gain measures from the training instances and forwards the data to neurons from the pattern layer, that is completely associated with the input layer. Then, assign the count of the neurons in the input layer to the length of the input vector. Additionally, the count of neurons that present to the pattern layer is equal to the count of training samples. In this approach, neurons are gathered into various groups, and ith neuron grouped in k considers as a Gaussian function f (k) i (x, ), i = 1, 2, ⋯ , m k , where m k implies the count of the neurons in group k, k = 1, 2, ⋯ , K . Moreover, the Gaussian function is named as a probability density function which is demonstrated in the following: where means the dimension of the input vector x = (x 1 , x 2 , ⋯ , x v ), x j that implies the jth component of the input vector x, x (k) ij shows the jth unit of ith neuron in class k .
Thus, it is named as smoothing attribute ∈ (0, 1) that estimated experimentally by relating the corresponding classification accuracy which is significant factor in calculating the error of PNN classification approach. Finally, the outputs of pattern layer is linked with the summation units which depends upon class of the patterns. The neurons for all groups and a neuron in summation layer add to the results that gained from pattern layer neurons as given below: Lastly, the final layer neuron provides value of 1 and various measures 0.e value of 1 determines the classifier's solution for input vectors. Especially, the input vector comes under the class k when p k (x) > p k � (x) for all k � = 1, 2, ⋯ , K and k ≠ k ′ .
Therefore, the key objective of PNN training is identifying the best estimates of probability density function based on the training instances as well as the labels to assure the classifier under a state of the lower error rate as well as the threats. If the samples examined and forwarded to the pattern layer, then the result of a neuron is measured on the basis of the trained density function. Also, the identified outcomes are attained by various computations in summation and output layers. We gain the following benefits when apply the PNN as a classifier for signals classification: 1. It composes a simple architecture, and it is simple for training. In PNN-related probability density function evaluation, the weight of the neuron in pattern is consumed from an input sample value. 2. The training process of a network is more sophisticated, and there is no need for retraining prolonged duration in both inclusion and deletion of the groups. 3. It is impossible to generate local optimal solutions, and the precision is maximum when compared with alternate classifiers. Additionally, the complexity of classification issues is sufficient for training instances, and the best solution can be accomplished by Bayes criterion.

Experimental validation
The performance of the M-BMIRC model is validated against two benchmark datasets NEMA CT (Shahid et al. 2022) andOASIS MRI (Siddiqu et al. 2022). The former NEMA CT database comprises a set of 600 images with a resolution of 512 × 512 pixels and the dataset comprises the images under ten distinct class labels. Additionally, the OASIS MRI database consists of a collection of 416 MRI images with the size of 208*208 pixels and the dataset holds images under four different class labels. Few set of sample test images are illustrated in Fig. 3 and the details are provided in Table 1. The parameter setting of the DCNN model is given as follows: learning rate: 0.05, batch size: 500, maximum epoch count: 15, dropout rate: 0.2, and momentum: 0.9. Table 2 and Fig. 4 present the detailed comparative retrieval results analysis of the M-BMIRC model on the applied NEMA CT images. The experimental results denoted that the LNIP model has shown insufficient retrieval performance by obtaining a minimum ARR of 0.1624, APR of 0.0867, and F-score of 0.1131. Next, the LNDP model has shown noticeably better retrieval results over the LNIP model with the ARR of 0.1642, APR of 0.1009, and F-score of 0.125. Followed by, the LBP model that has depicted slightly higher retrieval results with the ARR of 0.1659, APR of 0.0879, and F-score of 0.1149. Also, the LTDP model has demonstrated moderate results with the ARR of 0.1659, APR of 0.0897, and F-score of 0.1164. Furthermore, the DLTerQEP model has shown manageable results with the ARR of 0.1668, APR of 0.091, and F-score of 0.1177. Moreover, the LDGP model has tried to demonstrate better results over the earlier models with the ARR of 0.1728, APR of 0.0905, and F-score of 0.1188. Accordingly, the LDEP model has attained somewhat higher results with the ARR of 0.1839, APR of 0.1027, and F-score of 0.1318. Eventually, the LBDISP model has resulted a manageable result with the ARR of 0.1905, APR of 0.1011, and F-score of 0.1321. Moreover, the LWP and LBDP models have achieved near identical and slightly satisfactory results over the other methods except OFMMs and M-BMIRC model. Though the OFMMs model has produced near optimal results with the ARR of 0.7412, APR of 0.4851, and F-score of 0.5864. Finally, the proposed M-BMIRC model has shown superior results with the ARR of 0.7853, APR of 0.6285, and F-score of 0.7309. Table 3 and Fig. 5 present the detailed comparative retrieval results analysis of the M-BMIRC model in the

Conclusion
This paper has developed a new M-BMIRC technique for retrieving and classifying the biomedical images from huge databases. The proposed model performs the retrieval and classification processes using feature extraction, similarity measurement, and classification. Firstly, the ensemble  Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.