1 Introduction

Cancer is one of the leading causes of death worldwide, and it is a significant barrier to improving life expectancy. A brain tumor is caused by the growth of abnormal cells inside the brain, which damages the brain’s key tissues and progresses to cancer [1]. The American Cancer Society (ASC) predicts that 24, 810 people will have malignant tumors by 2023, with 18, 990 dying as a result [2]. There are about 150 different types of brain tumors that may be found in humans. There are (i) benign tumors; (ii) malignant tumors among them [3].While benign tumors develop slowly and do not spread to other tissues, destructive malignant tumors grow quickly [4]. The malignant tumors can be categorized into gliomas, meningiomas, and pituitary tumors as they are the most common types of brain tumors [5]. Gliomas grow from glial cells in the brain. Meningioma tumors often develop on the protective membrane of the brain and spinal cord [6]. Pituitary brain tumors are benign and develop in the pituitary glands, which are the underlying layer of the brain that produces some of the essential hormones in the body. In clinics, MRI can be the most effective tool for in vivo and noninvasive visualization of brain tumor anatomy and functionality [7]. Early diagnosis and accurate classification of brain tumors are urgent for saving a human’s life. There is a great difficulty in using manual technique due to its responsibility for about 10-30% of all misdiagnoses, so the use of computer-aided diagnosis (CAD) is vital for helping radiologists improve the accuracy and needed time for image classification [8,9,10]. In radiology, the application of artificial intelligence (AI) decreases error rates further than human effort [11]. Machine learning and deep learning are branches of artificial intelligence that can enable radiologists to locate and classify tumors quickly without requiring surgical intervention. [12]. A convolutional neural network (CNN) is a branch of a deep learning approach that has achieved great success recently in the field of medical imaging problems [13]. CNNs can extract the most useful features automatically and decrease the dimensions [14, 15]. So using CNN, the traditional handcrafted features are no longer necessary because it automatically learns the important features to provide final output predictions on its own. Various CNN models were used for brain tumor classification (i.e., GoogleNet [16], AlexNet [17], SqueezeNet [18], ShuffleNet [19], and NASNet-Mobile [20]). CNN models have achieved great success in the field of medical classification, such as skin lesion classification, breast cancer classification, COVID-19 classification, diabetic retinopathy classification, and arrhythmia classification [21,22,23,24,25]. By applying these pre-trained CNN models to classify brain tumors using majority voting, the strengths of each of the five CNN models were well exploited using the 3064 T1W-CE MRI dataset [26]. Nevertheless, brain tumor classification is extremely challenging due to changing morphological structure, complicated tumor appearance in an image, and irregular illumination effects, requiring the employment of an effective technique regarding brain tumor classification to support the radiologist’s choice. Every year, new classification methodologies emerge to overcome limitations in prior approaches. The contribution of this study could be summarized as follows:

  • This research proposes a hybrid technique based on majority voting to make the final classification decision for three categories of brain tumors (meningioma, glioma, and pituitary) using five fine-tuned pre-trained models to offer reliable and precise tumor classification and provide radiologists with an accurate opinion;

  • The majority voting technique is based on the prediction outputs of five fine-tuned, pre-trained models: GoogleNet, AlexNet, ShuffleNet, SqueezeNet, and NASNet-Mobile which are able to perform classification tasks well and compete with more advanced CNNs;

  • We have used a public brain tumor image dataset with minimal pre-processing steps in the training phase, and in the testing phase, images have been tested without any pre-processing steps;

  • Finally, the performance of our suggested technique has proved its efficient against state-of-the-art based on a variety of metrics, such as accuracy, recall, precision, specificity, confusion matrix, and f1-score, to classify three types of brain tumors.

This paper is organized as follows: the next part deals with the associated related work about brain tumor classification; Section 3 proposes methods that are described in detail; additionally, Section 4 emphasizes results of the experiments and discussion; and finally, Section 5 gives the conclusion and discusses future work.

2 Related works

Machine learning and deep learning are the two main techniques used for brain tumor classification [27, 28]. Machine learning has been utilized in various studies such as k-nearest neighbor (KNN), support vector machines (SVM), decision trees, and genetic algorithms [28,29,30,31]. Cheng et al. [26] proposed an approach to brain tumor classification that augmented the region of the tumor to improve the performance of classification. They utilized three techniques for feature extraction: First, a gray-level co-occurrence matrix, second, a bag of words; and finally, an intensity histogram. They obtained an accuracy of 91.28%. On the other hand, CNN can extract features well, classify with the last layers of fully connected layers, and achieve high results in medical imaging. Recent human works on medical image detection, segmentation, and classification of brain tumors from MR images are a very urgent task to determine the appropriate treatment in a timely manner [32,33,34]. For example, Cherukuri et al. [35] used Xception as a backbone model and proposed a multi-level attention network (MANet) that included both spatial and cross-channel attention on the 3064 T1W-CE MRI dataset. They achieved an accuracy of 96.51% on the T1W-CE MRI dataset. One of the major disadvantages of long-short-term memory (LSTM) networks is that they are more computationally expensive and require more complex tuning of the network. Furthermore, since LSTMs are more complicated than CNNs and RNNs, it can be difficult to debug them and identify any problems that may arise.

Guan et al. [36] first improved visual quality of the input image using contrast optimization and nonlinear strategies. Secondly, the tumor locations were acquired based on both segmentation and clustering techniques. Then, these locations were scored, used in parallel with the corresponding input image, and provided to EfficientNet to extract features. Thirdly, these locations were further optimized to improve detection performance. Finally, these locations were aligned and used to define tumor classes and their locations. They achieved accuracy of 98.04% with fivefold cross-validation on the T1W-CE MRI dataset. However, this study has the drawback of increasing the computational cost because it requires the training of many networks. Badža. et al. [37] proposed a developed CNN with 22 layers to classify three types of tumors (meningioma, glioma, and pituitary) in the T1W-CE MRI dataset. They used subject-wise and record-wise tenfold cross-validation on both the augmented and original image databases for the network test. The best accuracy obtained using record-wise tenfold cross-validation with an augmented dataset was 96.56% on the 3064 T1W-CE MRI dataset. Deepak et al. [38] used deep learning and machine learning as they modified a pre-trained GoogleNet with the Adam optimizer. When SVM or KNN was employed instead of the classification layer within the transfer learning model, the system’s accuracy increased. They obtained accuracy of 92.3%, 97.8%, and 98% for GoogleNet, SVM, and KNN, respectively, with fivefold cross-validation on the T1W-CE MRI dataset. This research supports the ability of the proposed system to precisely classify the three types of brain tumors, but it has many drawbacks, including: first, as an independent classifier, the transfer-learned model performed relatively poorly. Second, there was a significant misclassification in meningioma class samples.

Díaz-Pernas et al. [39] proposed a method for brain tumor segmentation and classification on the T1W-CE MRI dataset. Sliding window segmentation, with each pixel classified using a N \(\times \) N window. They used data augmentation for the input data to prevent overfitting. Classification CNN has three pathways (small, medium, and large feature scales) for extracting features. The multiscale CNN can effectively segment and classify the three different types of tumors with an accuracy of 97.3% on the T1W-CE MRI dataset. The lowest value of sensitivity for meningioma is because of the lower intensity of contrast between the tumor and healthy areas, which can be considered a limitation. Alhassan et al. [40] proposed a CNN with a hard-swish-based RELU activation function. The proposed approach included three steps: First, an image pre-processing step using the normalization technique was applied to enhance the visualization of the brain images. Then, they used the HOG feature descriptor for extracting the feature vectors from normalized images. Finally, for the classification step, a hard-swish-based RELU activation function was used as a CNN classifier to classify gliomas, meningiomas, and pituitary tumors. They attained accuracy of 98.6% for brain tumor classification on the T1W-CE MRI dataset.

Ghassemi et al. [41] utilized a pre-trained deep convolutional neural network as a GAN discriminator (DCGAN). The discriminator can learn and extract robust features from MR image structures. Then, a softmax layer was used within the GAN discriminator instead of the last fully connected layer. They achieved accuracy of 95.6% for brain tumor classification on the T1W-CE MRI dataset. One of the drawbacks of the study is the limitations of GAN, as the size of the network was 64\(\times \)64 and this prevents the use of pre-trained architectures as discriminators because they need larger input sizes. Noreen et al. [42] used deep learning and machine learning for brain tumor classification. They applied a fine-tuned Inception-v3 to extract features, and then, classification was performed by replacing the final layer with a softmax layer, SVM, KNN, random forest (RF), and ensemble technique using machine learning. They applied a fine-tuned Xception to extract features, and then, classification was performed by replacing the final layer with a softmax layer, SVM, KNN, random forest (RF), and ensemble technique using machine learning. Inception-v3-Ensemble method produced the best testing accuracy of 94.34% among all proposed methods for brain tumor classification on the T1W-CE MRI dataset. Proposed methods suffer from time-consuming and high computational costs.

Gumaei et al. [43] proposed a regularized extreme learning machine (RELM) as a hybrid feature extraction method. First, the step of pre-processing the brain images using min–max normalization to boost the contrast of brain edges. Next, features of brain tumors were extracted based on hybrid methods called PCA-NGIST that used a Normalized GIST descriptor with PCA to extract the significant features. Finally, a RELM was used to classify brain tumor types. They achieved accuracy of 94.233% with fivefold cross-validation on the T1W-CE MRI dataset.

Haq et al. [44] produced the DCNN technique for detection and classification of brain tumors, which included three stages: first, pre-processing step was applied using (a) N4ITK, (b) normalization, (c) for each pixel in the image, the mean intensity value and standard deviation were calculated, (d) after completing N4ITK and normalization, the proposed model parameters were initialized, and (e) the values of each parameter were then updated. Data augmentation methods were applied to flipping, translation, and rotation. Second, they used a GoogleNet variant model with back-propagation with and without using conditional random fields (CRF) as a post-processing step that can assign image pixel features and their associated labels. They achieved 97.3%, and 95.1% accuracy with and without CRF, respectively, on the T1W-CE MRI dataset. The proposed model has disadvantages, such as not being suitable for classification tasks involving small amounts of data. When the network receives erroneous information from various imaging modalities of patients, the classification performance suffers. A lot of pre-processing steps.

Ghosal et al. [45] used a DCNN-based SE-ResNet-101 architecture that was fine-tuned to fit training data. For the pre-processing step, they applied ROI segmentation, intensity zero-centering, intensity normalization, and data augmentation methods such as elastic transform, flip, shear with different transformation degrees, and rotate. The SE-ResNet-101 model, which combines ResNet-101 with squeeze and excitation (SE) blocks, was then trained to classify three brain tumors (meningiomas, gliomas, and pituitary) on the T1W-CE MRI dataset. They achieved overall accuracy of 89.93% and 93.83% without and with data augmentation, respectively. Nawaz et al. [46] utilized three steps to classify brain tumors: firstly, they developed annotations to identify the exact location of the interest region. Second, to extract the deep features from the suspected samples, they implemented a custom CornerNet with a base network of DenseNet-41. Finally, they employed the one-stage detector with CornerNet to locate and classify several brain tumors. They achieved 98.8% based on the T1W-CE MRI dataset. The proposed method has the advantage of providing a low-cost solution to brain tumor classification, as CornerNet employs a one-stage object identification framework.

Verma et al. [47] suggested Hyper-Sphere Angular Deep Metric-based Learning (HSADML) with MobileNet as the backbone network, which used deep angular metric learning using SphereFace Loss to improve generalization and robustness in classification. They achieved 98.69% of overall accuracy based on the T1W-CE MRI dataset. The suggested method had the benefit of enhancing intra-class separability and decreasing intra-class variability, which resulted in significant efficiency increases. The weakness of this research is that it did not emphasize on the backbone network; thus, in future aspects of the work, there is a gap for the introduction of particular attention-based domain-specific networks.

Cinar et al. [48] applied image cropping initially to eliminate any unneeded areas in the image and make sure that the model focused only on those areas. Then, they implemented different data augmentation techniques on the original images: rotating, flipping, zooming, cropping, and translation. They designed a CNN from scratch to classify different types of brain tumors based on the T1W-CE MRI dataset. They achieved an overall accuracy of 98.09%, 98.32%, and 96.35% where the dataset was divided at different rates (70–30%, 80–20%, and 90–10%) for training and testing, respectively. The suggested CNN has the advantage of demonstrating that categorization can be done without the aid of deep networks. The model also has some drawbacks; the training period is very long without the use of the transfer learning method. As a result, working with larger datasets proved impractical, and training the model with a large dataset like ImageNet would have been impossible on normal computers.

Deepak et al. [49] proposed a custom CNN model that consisted of five convolutional layers and two fully connected layers, where each convolution layer was followed by a batch normalization layer, then a ReLU as the activation function, and max pooling was provided after each ReLU activation function. For the pre-processing steps, the dataset was resized into (256\(\times \)256) and then normalized between [0 and 1]. They used two experiments using a fivefold cross-validation method: first, the CNN with a softmax classifier, and second, The CNN with a SVM, which improved classification results from 94.2 to 95.8% where the dataset was divided into 70–30% for training and testing based on the T1W-CE MRI dataset.

Deepak et al. [50] introduced two approaches to enhance the expert system’s performance: the first, deep feature fusion, concerned with the fusion of deep features extracted from CNN models trained with different loss functions. Common models such as SVM and KNN have been used to categorize the fused deep features. In the second approach, majority voting, the outputs for three distinct feature sets were extracted from CNN models trained with separate loss functions. The same classifier was applied to three separate feature sets using a majority voting approach. The experiments were applied to the TL ResNet-18 model to validate the suggested idea. For all experiments, the dataset was divided into training sets, validation sets, and test sets (60:20:20). On CNN, they achieved accuracy of 94.8%, 95.3%, 94.9%, and 95.6% for deep feature fusion with SVM, deep feature fusion with KNN, majority voting with SVM, and majority voting with KNN, respectively. On CNN, they achieved Balanced accuracy of 94.2%, 94.5%, 93.9%, and 94.8% for deep feature fusion with SVM, deep feature fusion with KNN, majority voting with SVM, and majority voting with KNN, respectively. On TL, they achieved accuracy of 94.1%, and 94.1% for deep feature fusion with KNN and majority voting with SVM, respectively. On TL, they achieved accuracy of 94.8% and 94.8% for deep feature fusion with KNN, and majority voting with SVM, respectively. Compared with the existing methods, the proposed approaches based on CNN trained with cross-entropy loss significantly enhanced the predictions of the three types of brain tumors. On the other hand, the proposed approaches have a drawback of increasing computation involved as they required training the CNN using three different loss functions, with 0.34G MAC operations have used in each training iteration.

Kumar et al. [51] used three CNN models, which were AlexNet, ResNet 50, and Inception V3. For the pre-processing step, the images were resized, normalized between [0] and [1], and some augmentation techniques were performed, such as sheer transformation, fill-up, crop, flips, translation, and rotation. This work demonstrated the efficiency of the CNN architecture in identifying enhanced MRI brain tumor images based on their results using different techniques like ResNet50, AlexNet, and Inception V3. They achieved accuracy of 93.51%, 98.24%, and 92.07%, respectively. This work also has some disadvantages: (1) Due to an operation like max pool, a CNN is much slower. (2) If the CNN has multiple layers, the training procedure will take a long time if the computer’s GPU is not up to par. (3) For a ConvNet, a large amount of data needs to be trained.

Briefly, although distinct techniques and algorithms have been discussed in related works for the classification of brain tumors, the majority of prior techniques have some limitations, such as: (1) Traditional (ML) classifiers depend on handcrafted features, which are time-consuming, memory-intensive, and reducing the efficiency of the system. (2) On the other hand, CNN approaches are getting more attention because they can extract features directly from input data, but they require high complexity, fixed input image sizes for each model, and expensive (3) In addition, the challenging task of selecting a CNN model with suitable hyperparameters to maximize the model’s performance for the classification process. (4) Furthermore, most of the previous works suffered from a large number of pre-processing steps. In this research, to overcome these limitations, a minimum number of pre-processing steps are used, along with an accurate choice of deep learning models with suitable hyperparameters. Furthermore, as previously noted, CNN-based classification of brain tumor images was promoted as the single best-performing model in previous research studies. As a result, it is necessary to make the best use of the potential of the hybrid technique for this challenging task. In order to accomplish this, five fine-tuned pre-trained CNN models have been used: GoogleNet [16], AlexNet [17], ShuffleNet [19], NASNet-Mobile [20], and SqueezeNet [18], respectively. These CNN models have been used in medical classification tasks and achieved good results, discussed in [22, 23, 25, 52,53,54]. Fine-tuned pre-trained models achieved superior or equal classification results compared to other, more complicated CNNs for COVID-19 classification [23, 54]. Therefore, the hybrid technique of using multiple effective models can offer additional benefits over a stand-alone model, allowing for a remarkable accuracy rate that outperforms other previous solutions.

3 Proposed methods

The proposed research methodology is illustrated in Fig. 1, which depicts an abstract representation of the proposed hybrid technique for classifying brain tumors using MRI images. The basic steps of the proposed hybrid technique for classifying brain tumors are as follows: Firstly, we downloaded the freely accessible T1W-CE MRI dataset [26], including meningioma, glioma, and pituitary MR images, and we randomly split the dataset into 70% training and 30% testing. Thirdly, we used a data augmentation technique for training images only and testing images without any pre-processing; thus, only original images from the dataset were used to test the trained models. In the fourth stage, the input MRI images of the dataset were resized to fit the suggested CNN model’s input image size. Next, we employed different five fine-tuned pre-trained models, i.e., GoogleNet, AlexNet, ShuffleNet, SqueezeNet, and NASNet-Mobile to distinguish their performance in classifying different categories of brain tumors. The suggested CNN models contained layers from pre-trained networks, with the last three layers replaced to accommodate the new image classes (meningioma, pituitary, and glioma) except for in the pre-trained SqueezeNet, the 1-by-1 convolutional layer was changed instead of the last learnable 1-by-1 convolutional learnable layer with the same number of filters as the number of classes, where WeightLearnRateFactor and BiasLearnRateFactor were changed to 10 to make the learning rates faster than the transferred layers. Then, the majority voting technique was applied using three models, which have the highest accuracy among others. Finally, the proposed majority voting technique was applied to the combination of the outputs of the five models, treating them as a decision-making committee. The voting approach is effective in covering the classification error of the individual models in the proposed methodology. The performance of the proposed technique was evaluated using standard performance measures such as overall accuracy, specificity, sensitivity, f1-score, and confusion matrix.

Fig. 1
figure 1

Structure of the proposed hybrid technique for brain tumor classification

3.1 GoogLeNet

GoogleNet was the winner of the ILSVRC 2014 competition, which has 138 million to 4 million parameters. The GoogleNet architecture utilizes nine inception modules and consists of 22 learning layers with four maximum pooling layers and one average pooling layer, as shown in Fig. 2a. To reduce overfitting, each fully connected layer contains a rectified linear activation (ReLU) function [16, 55]. The main objective of the inception module is to run multiple operations (pooling and convolution) with multiple filter sizes (1\(\times \)1,3\(\times \)3, and 5\(\times \)5) in parallel, as shown in Fig. 2b. It can capture various patterns of data using different sizes of kernels and filters, to combine features and provide effective output. GoogleNet was fine-tuned by replacing the last three layers, ”loss3-classifier,” ”prob,” and ”output” layer by a ”fully connected layer,” a ”softmax layer,” and a ”classification output” layer to classify new images with types of tumors (meningiomas, gliomas, and pituitary).

Fig. 2
figure 2

a The GoogleNet architecture has two depths for all convolutional layers and inception modules; b Inception architecture

3.2 AlexNet

AlexNet was the first deep CNN architecture trained using 1000 different classes on 1.2 million images to win the ILSVRC competition in 2012 [17]. Figure 3 shows that it consists of five convolutional layers and three fully connected layers. The first two convolutional layers are connected by overlapping max-pooling layers in order to extract deep features. The third, fourth, and fifth convolutional layers are directly connected to the fully connected layers. All of the outputs of the convolutional and fully connected layers are connected to the ReLU nonlinear activation function [56]. A softmax activation layer is connected to fully connected layers to produce 1000 different classes. The input image for this network has a size of \(227 \times 227 \times 3\). The last three layers must be fine-tuned and replaced to classify the three types of brain tumors (meningiomas, gliomas, and pituitary).

Fig. 3
figure 3

AlexNet architecture

3.3 SqueezeNet

SqueezeNet is a CNN that employs design strategies to reduce the number of parameters, in particular with the use of fire modules. Figure 4a shows that it contains 15 layers with 5 different layers: 2 layers of convolution, 3 layers of max pooling, 9 layers of fire, 1 layer of global average pooling, and 1 output layer of softmax [18].Initially, a stand-alone convolutional layer named ”conv1” is applied to an input image. An individual filter exists in a squeeze convolutional layer. These are input into an expanded layer that combines 1\(\times \)1 and 3\(\times \)3 convolution filters to extract features at different scales (capture spatial information). Following this layer are eight ”fire modules,” denoted by the numbers ”fire2” through ”fire9.” Then, ”Conv1,” ”Fire4,” ”Fire8,” and ”Conv10” are followed by max pooling with a stride of 2. Figure 4b illustrates that a fire module is formed of the squeeze, the ReLU activation, and expand layers which form the fire layers between the layers of convolution [57]. Assume that FM is feature maps, c is channels, and the output layer of the squeeze operation using kernel w is called f{y}, which can be expressed in an Equation (1) [58]:

$$\begin{aligned} f\{y\}=\sum _{fm1=1}^{FM} \sum _{c=1}^{C} w_f^{c}x_c^{fm1} \end{aligned}$$
(1)

This CNN model must be fine-tuned to classify the three types of brain tumors (meningioma, glioma, and pituitary) as follows:

  • The last learnable layer and a final classification layer use the image features extracted by the network’s convolutional layers to categorize the input image;

  • Set a new convolutional2dlayer instead “conv10” which has filter size [1,1] with numfilters of 3 as the number of classes;

  • Set WeightLearnRateFactor and BiasLearnRateFactor to 10 to make the learning rates faster in the new layer than in the transferred layers.

Fig. 4
figure 4

a SqueezeNet architecture; b The fire module in SqueezeNet

3.4 ShuffleNet

Shufflenet is a CNN that is computationally effective and is designed essentially for devices with limited computational abilities, such as mobiles [19]. It can achieve better accuracy with low computational costs as it utilizes two operations (”channel shuffle” and ”point wise group”). It has 172 layers with input image size \(224 \times 224 \times 3\). In Fig. 5a on the bottleneck feature map, there is a 3\(\times \)3 depthwise convolution for the 3\(\times \)3 layer In Fig. 5b, A ShuffleNet unit is created by replacing the initial 1\(\times \)1 layer with pointwise group convolution and channel shuffle. The second pointwise group convolution’s objective is to restore the channel dimension to match the shortcut path [59]. In Fig. 5c, there are two modifications:

  • Use a \(3\times 3\) average pooling to find the shortest path.

  • Replace element-wise addition with channel concatenation to easily increase channel dimension with a minimum extra computation cost.

All components of the ShuffleNet unit can be calculated effectively thanks to pointwise group convolution with channel shuffle. The ShuffleNet model must be fine-tuned to classify the brain tumor classes by replacing the last three layers of the model. The last three layers of the model, i.e., the ”fully connected layer,” the ”softmax layer,” and the ”classification output layer” were replaced by a new ”fully connected layer,” ”softmax layer,” and ”classification output layer” to classify new images with types of tumor (meningioma, glioma, and pituitary).

Fig. 5
figure 5

Units of ShuffleNet. a the bottleneck unit with depthwise convolution (DWConv); b ShuffleNet unit with a pointwise group convolution (GConv) and channel shuffle; c ShuffleNet unit with a stride = 2

3.5 NASNet-Mobile

The Google Brain team developed the Neural Architecture Search Network (NASNet) [20], which is shown in Figure. 6a. NASNet, which framed the problem of determining the best CNN architecture as a reinforcement learning problem with the ability to recall information in order to improve learning accuracy. Figure. 6b shows that the smallest unit in NASNet architecture is the block, and the combination of blocks is called a cell. It has two types of convolutional cells that are repeated several times [60, 61]. These are normal and reduction cells; both of these yield a feature map as shown in Fig. 7:

  • A normal cell, which determined the size of the feature map.

  • A reduction cell, which sorted the reduction feature map’s dimension.

There are two types of NASNet architecture: NASNet-Large and NASNet-Mobile. NASNet-Mobile has 53,26,716 parameters, but NASNet-Large has 8,894,918. Therefore, NASNet-Mobile is more reliable. The NASNet-Mobile model must be fine-tuned to classify the brain tumor classes by replacing the last three layers of the model. The model’s final three layers, ”fully connected layer,” ”softmax layer,” and ”classification output layer” were replaced by a new ”fully connected layer,” ”softmax layer,” and ”classification output layer” to classify new images with tumor types (meningioma, glioma, and pituitary).

Fig. 6
figure 6

a NASNet-Mobile architecture; b Cell formation in NASNet architecture

Fig. 7
figure 7

a NASNet-Mobile Normal Cell architecture; b NASNet-Mobile Reduction Cell architecture

3.6 Majority voting technique

The suggested technique employs a hybrid technique that combines five prediction outputs from fundamental models to offer a robust predictive model for classifying brain tumors into meningioma, glioma, and pituitary tumors. The five deep learning models used to categorize brain tumors are GoogleNet, AlexNet, SqueezeNet, ShuffleNet, and NASNet-Mobile. Although each of the aforementioned models performs well in classifying brain tumors individually, they are still far from an optimum classification system. These models’ capabilities can be combined to approximate the performance of the suggested technique and develop an effective brain tumor classification technique. So, when the above five models are combined, all three or four learning models may cover the mistake of the fifth learning model, reducing the total error of the system in the classification task as shown in Fig. 8a. In the following sections, we will discuss the results of each model and then demonstrate that the proposed technique of a majority vote improved outcomes when compared to related work.

Fig. 8
figure 8

a Majority voting; b Idea of fivefold cross-validation with split data

3.7 Fivefold cross-validation

For the experiments, the fivefold cross-validation methodology was used. Five distinct sets were generated at random from each dataset using this technique. There were various patient samples in each fold, or training and test set as represented in Fig. 8b. It can evaluate the performance of the model more accurately and provide a more reliable prediction of how the model will perform on untested data.

4 Results and evaluation

4.1 Dataset and pre-processing

Public brain tumor data was used from a Figshare source, which consists of 3,064 T1-CE brain MRI slices acquired from 233 patients and presented by Cheng et al. in 2017. These images were obtained over 5 years from Nanfang Hospital, Guangzhou, China, and Tianjian Medical University General Hospital, China, in 2015 and updated in 2017. It has three types of brain tumors: 1426 (glioma), 708 (meningioma), and 930 (pituitary tumor), each with three views: axial, sagittal, and coronal. Images in the dataset are in 2D volumes with a resolution of 512\(\times \)512 and pixel sizes of \(0.49 \times 0.49 \, \text {mm}^2\) in.mat format. Furthermore, three experienced radiologists bordered the tumor region in the MRI manually [26]. The data was randomly split into 70% for training and 30% for testing. Figure 9 shows three different types of brain tumors, as follows: meningioma, glioma, and pituitary, which have different acquisition views (axial, coronal, and sagittal) and are shown from left to right, respectively. Table 1 provides supplementary details on the description of the T1W-CE MRI dataset.

Fig. 9
figure 9

Three different categories of brain tumors are as follows: meningioma, glioma, and pituitary. a Meningioma; b Glioma and c Pituitary. The three different acquisition views (axial, coronal, and sagittal) are shown from left to right, respectively

Table 1 Number of slides for each type of brain tumor (meningioma, glioma and pituitary tumor) and the corresponding number of patients in the T1W-CE MRI dataset

Pre-processing data aids in optimizing input data for the next training step. We convert data from.mat files into.png format, then duplicate them to provide m\(\times \)n\(\times \)3 images. Data augmentation techniques can improve model training capabilities [37, 38]. The five models make use of two augmentation techniques: translation on both the x and y axes and reflection on the x axis. These techniques enable CNN to look everywhere in the image to capture it. Then, the MR images were resized to fit the input image size for each CNN model: (224\(\times \)224\(\times \)3) for GoogleNet, ShuffleNet, and NASNet-Mobile, and (227\(\times \)227\(\times \)3) for AlexNet and SqueezeNet.

4.2 Evaluation metrics

The suggested model’s performance was evaluated based on its overall accuracy, specificity, sensitivity, precision, f1-score, confusion matrix, Matthews correlation coefficient (MCC), kappa, and classification success index (CSI). It is easy to obvious the harmonic mean of recall and precision values through the f1-score value. Considering the extreme cases is the reason why a harmonic mean is more preferable than a simple mean. In a case of simple average calculation, a model with a recall value of 1 and a precision value of 0 might have 0.5 f1-score, and this might cause misleading [62,63,64,65]. The mathematical representations are defined in Eq. 2:Eq. 11 as follows:

$$\begin{aligned}{} & {} {\text {ACC}} = \frac{\text {TP + TN}}{\text {TP + TN + FP + FN}} \end{aligned}$$
(2)
$$\begin{aligned}{} & {} \mathrm{{Sensitivity}} (\mathrm{{Recall}}) = \frac{\text {TP}}{\text {TP + FN}} \end{aligned}$$
(3)
$$\begin{aligned}{} & {} {\text {Specificity}} = \frac{\text {TN}}{\text {TN + FP}} \end{aligned}$$
(4)
$$\begin{aligned}{} & {} {\text {Precision}} = \frac{\text {TP}}{\text {TP + FP}} \end{aligned}$$
(5)
$$\begin{aligned}{} & {} {\text {F1-Score}} = \frac{2*\text {Precision * Sensitivity}}{\text {Precision + Sensitivity}} \end{aligned}$$
(6)
$$\begin{aligned}{} & {} {\text {MCC}} = \frac{\text {(TP * TN)}-\text {(FP * FN)}}{\sqrt{\text {(TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)}}} \end{aligned}$$
(7)
$$\begin{aligned}{} & {} {\text {kappa}} = \frac{\textit{P}_o - \textit{P}_e}{1 - \textit{P}_e} \end{aligned}$$
(8)

where Po = the observed agreement, and Pe = the expected agreement which were calculated in Eqs. 9, and 10

$$\begin{aligned}{} & {} \textit{P}_o = \frac{\text {TP + TN}}{\text {TP + TN + FP + FN}} \end{aligned}$$
(9)
$$\begin{aligned}{} & {} \textit{P}_e = \frac{(\text {(TP + FP)*(TP + FN)+(TN + FP)*(TN + FN)})}{\text {(TP + TN + FP + FN)}^2} \end{aligned}$$
(10)

The classification success index (CSI) is an evaluation tool that measures a classification model’s efficiency by calculating the percentage of the samples that were correctly classified relative to all samples.

$$\begin{aligned} {\text {CSI}} = \frac{\text {TP}}{\text {TP + FP + FN}} \end{aligned}$$
(11)

4.3 Hyper-parameters

The goal of hyperparameter optimization is to improve the performance of a particular deep learning model by selecting the most appropriate hyperparameter. Table 2 demonstrates that all models, stochastic gradient descent (SGD), 50 epochs, false verbose, every-epoch shuffle, and a minibatch size of 10 images were employed that all utilized models, stochastic gradient descent (SGD), 50 epochs, verbose of false, shuffle of every-epoch, and a minibatch size of 10 images were used. But GoogleNet and AlexNet models used a learning rate of \(10^{-4}\), while SqueezeNet, ShuffleNet, and NasNet-Mobile used a learning rate of \(2 \times 10^{-4}\), which depended on a trial-and-error method. All experimentations were applied using laptop computer with MATLAB© 2023a. Dell G5-5587 Core i7-8750 H 16 g 1T hard desk drive (HDD) 256 G solid state drive (SSD) GTX1050Ti 4 G Windows10.

Table 2 Parameter values employed in training networks

4.4 Experimental results and discussion

This section discusses the performance of different five pre-trained DL models and the suggested hybrid technique for brain tumor classification using the T1W-CE MRI dataset into meningiomas, gliomas, and pituitary tumors. At each training phase, 70% of the dataset is used for model training and validation. The performance of the suggested model was assessed using the remaining 30% of the dataset. Figure 10 represents the accuracy and loss of the training and test phases for the ShuffleNet and SqueezeNet models over fifty epochs. A confusion matrix is constructed based on the model’s positive and false predictions to evaluate the effectiveness of the proposed system.

Figure 11 outlines the confusion matrices acquired during experiments, where class ”1” indicates ”meningioma,” ”2” indicates ”glioma,” and ”3” indicates ”pituitary.” It was noticed that the pituitary class has the highest classification proportion. Table 3 presents a comparison between the five fine-tuned pre-trained models that were selected based on their effectiveness in classification tasks to help select an appropriate model for image classification. To achieve the goal, the performance of five pre-trained networks, including GoogleNet, AlexNet, SqueezeNet, ShuffleNet, and NasNet-Mobile, was also examined based on the testing results, which achieved an accuracy of 96.08%, 95.16%, 96.67%, 97.17%, and 97.5%, respectively. NasNet-Mobile has achieved the greatest accuracy of 97.50%, as well as outperforming various assessment measures such as sensitivity (recall), specificity, precision, f1-score, MCC, K., and CSI of 97.4%, 98.52%, 97.04%, 97.47%, 97.03%, and 97.61%, respectively. However, GoogleNet achieved the lowest accuracy of 95.16%, in addition to other evaluation metrics such as sensitivity (recall), specificity, precision, f1-score, MCC, K., and CSI of 95.21%, 97.58%, 95.15%, 95.16%, 95.13%,95.15%, and 95.18%, respectively.

Fig. 10
figure 10

The accuracy and loss of the training and test phases for the ShuffleNet and SqueezeNet models over fifty epochs. a ShuffleNet; and b SqueezeNet

The majority voting technique was applied based on three models (ShuffleNet, SqueezeNet, and NASNet-Mobile), which have the highest accuracy among the five utilized models, and achieved an accuracy of 98.5%. But to get more accurate image classification, the majority voting technique was applied using five fine-tuned pre-trained models, including GoogleNet, AlexNet, SqueezeNet, ShuffleNet, and NasNet-Mobile, achieved an accuracy of 99.31% and outperformed against other utilized fine-tuned models and majority voting based on three models. Thus, it has been shown to be the most effective technique for brain tumor classification.

Table 3 The ACC(%), Sen.(%), Spec.(%),Pre.(%), F1-score(%), MCC(%), K.(%), and CSI(%) for the proposed system and five different models using the T1W-CE MRI dataset

We compared the results obtained from the hybrid technique with the latest results in the literature, which considered evaluations on a 70–30% training–testing data split against [46,47,48]. Table 4represents a comparison of our proposed technique with that of Deepak et al. [50], who applied the majority technique and divided the same dataset into 60:20:20 for training, validation, and testing. The comparative results indicate a better efficiency of the majority voting technique over the other techniques. It is also important to mention that the most unique point of our proposed hybrid technique is that it is based on more than one model, so it is less likely to make errors than what could happen to a single model. The proposed method provides a 0.51% performance boost in terms of classification accuracy. Furthermore, it can also provide an average improvement of 1.24%, 1.1%, and 0.2% in precision, recall, and F1-score, respectively. Therefore, it is possible to deduce that the proposed hybrid technique is more robust for brain tumor classification. So that it can be used to develop a classification of brain tumors using MR images. Moreover, it will aid radiologists and surgeons in treating the deadly tumor, which causes so many deaths.

Table 4 Comparative results between the proposed system and the latest literature techniques on the same T1W-CE MRI dataset

Using fivefold cross-validation, the five different pre-trained models from GoogleNet, ShuffleNet, SqueezeNet, AlexNet, and NASNet-Mobile were implemented and achieved accuracy of 96.31%, 97.55%, 97.9%, 96.7%, and 98.3%, respectively. Table 5 illustrates the results accomplished from the five fine-tuned with fivefold cross-validation method. NASNet-Mobile outperformed other utilized models by achieving an accuracy of 98.3%. Comparing with Cheng et al. [26] and Deepak et al. [38]. NASNet-Mobile has performed very well in classifying brain tumors and has achieved good results, where GoogleNet achieved less accuracy compared to other proposed fine-tuned models.

Fig. 11
figure 11

Confusion matrices show the comparison between different five deep learning models in classifying 1, 2, and 3, which refer to meningiomas, gliomas, and pituitary in a testing dataset, respectively. a AlexNet; b GoogleNet; c ShuffleNet; d SqueezeNet; and e NASNet-Mobile

Table 5 Comparative results for the classification of brain tumors using fivefold cross-validation on the same T1W-CE MRI dataset

5 Conclusion and future work

This research provides an efficient hybrid brain tumor classification technique using minimal pre-processing. The proposed technique applied the concept of majority voting to the prediction outputs of five different CNN models to get an accurate classification of brain tumors based on the T1W-CE MRI dataset. The system demonstrated the highest classification accuracy when compared with related work on similar datasets. Several metrics have been used to assess the system’s performance in order to determine its robustness. The proposed hybrid classification technique for classifying brain tumors achieves an accuracy of 99.31%. Our robust automated classification technique will vastly reduce the amount of effort and time required to classify brain tumors. This hybrid technique outflanks the existing models; therefore, radiologists can utilize this system as a second opinion, as it reduces computation time while increasing accuracy and reducing the rate of misclassification. Even with the improvements described in this research, there are significant limitations, such as the need for much more patient data, especially for the meningioma class, which has the lowest number of images among the three studied categories. Moreover, the efficiency of the proposed technique will be investigated for various types of medical image analysis, such as lung cancer, multi-skin lesions, and breast cancer classification. Finally, by including normal brain CE MRI images in the dataset, further distinction for tumor classification may be provided.