1 Introduction

Brain tumors are one of the most common causes of human death. Thus, an early and accurate diagnosis is critical for an effective treatment process. In clinical neuroradiology, pre-treatment diagnosis of brain tumors using Magnetic Resonance Imaging (MRI) is challenging. This is because in contrast t-1 weighted MRI scans, their appearance is very similar to hyperintense brain lesions such as lipoma, dermoid cysts, thrombosis, etc. [65]. Additionally, the human eye cortex has a limited capacity to distinguish between different gray levels present in both MRI and computerized tomography.

As brain cells renew themselves, the abnormal cells that occur in the replication phase grow and become a mass, forming brain tumors. There are two types of brain tumors with benign/primary (e.g. pituitary or meningioma) and malignant/secondary (e.g. glioma). While benign brain tumors do not spread, malignant tumors spread throughout the body using different organs, such as breasts and lungs to make brain metastases [48]. One type of malignant tumors, gliomas not only invade the surrounding tissues but have the ability to metastasize to distant tissues. As a result, they require a quick and accurate diagnosis, as they have a faster growth rate, a tendency to invade surrounding tissues, and the ability to metastasize to different tissues. Therefore, the development of more effective therapeutics in both diagnosis and treatment is crucial. Additionally, there are some drawbacks to the ability of conventional MRI to discriminate between primary and metastases tumors and central nervous system masses, because their radiological features appear to be similar. Artificial Intelligence (AI)-based research on existing data is required to help guide decisions, comprehensive datasets from various users, and retrospective analysis of data to shed light on exploring new avenues in both diagnostic and therapeutic processes [12].

With the development of computer vision technology, AI technology produces smart solutions in many fields such as industry [43, 70], medicine (e.g. cancer detection) [20], early diagnosis and treatment of non-symptomatic liver disease [41], predicting fatal malaria [55], thoracic surgery [21], face mask detection for COVID-19 prevention [5, 52, 54], nanotechnology [13], robotics [19, 72], agriculture [53, 56, 74]. Convolutional Neural Network (CNN) is one of the most commonly used neural networks in Deep Learning (DL) with its strong self-learning, adaptability, and generalization ability.

This article proposes a new CNN-based framework to detect tumors and categorize brain tumor types at the pre-diagnosis stage. In the proposed framework, the MR images are first cropped up to the skull, then the histogram equalization and denoising filter are performed. Then, the data augmentation technique is used to ensure stable learning. Finally, the EfficientNetv2 + Ranger pre-trained CNN model was performed using the fine-tuned hyperparameters for brain tumor detection and pre-diagnosis.

The main contributions of this study are:

  1. 1

    To introduce a new CNN-based classification system with EfficientNetv2 + Ranger architecture for malign tumors (glioma, meningioma, and pituitary).

  2. 2

    To show that data pre-processing is crucial for the accurate diagnosis of tumors.

  3. 3

    To provide better accuracy and stable learning procedure of the automated diagnosis of tumors compared to the performance of the other CNN architectures in multi-class scenarios using Ranger optimizer on different datasets.

  4. 4

    To explore the behaviors of recent optimization algorithms in CNN networks for MRI images.

  5. 5

    To propose an alternative method for the rapid diagnosis of brain tumors using computer-assisted radiological examination in addition to neurological examination, and a future-oriented guide intended to encourage other scientists to conduct advanced studies in this area.

In the general structure of this work, Section 2 reviews related works. Then, a detailed description of the basic algorithm, with some considerations on the pre-processing tasks, is presented in Section 3. In Section 4, the experiments are evaluated using both the perceptual and classical quality criteria and the effectiveness of our methodology. Section 5 discusses the detailed results using existing tumor classification methods and Section 6 introduces the conclusions and future work, respectively.

2 Related works

Computer-assisted smart healthcare systems have been rapidly developing in order to provide more coordinated and quality service to patients needing after a disease diagnosis. After the emergence of MRI, it became possible to analyze both the anatomical status and the biochemical structure of brain morphology. Additionally, CNN-based methods have made tremendous progress in analyzing the growing quantity and diversity of distinct tumor types. [27] compared different CNN architectures where these two CNN architectures used to solve these problems related to the instability of tumor labels using a two-step training procedure. Unlike traditional CNN models, the two-step training procedure was constructed using the average of the outputs from both local details and global texture features. [71] solved the medical image classification problem by using features from segmentation networks, which allowed for the learning procedure to be executed more easily and robustly in real classification problems involving complex structures. Their study compared the pre-trained ImageNet classifiers and the scratch-trained classifiers and they demonstrated that the ImageNet and pre-trained VGGNet [64] neural network yielded more successful results. Similarly, [38] developed a new abnormal brain disease categorization using several classes i.e., sarcoma, meningioma, gliomas, and metastases through the transfer learning method and VGGNet neural network. In this approach, the last few layers of VGGNet were updated to embed new image categories by including a pre-trained model in the learning procedure. Although this model has a very long training time, it has slightly better accuracy than existing studies focused on standard performance metrics. [40] established an algorithm that classifies brain images either as tumor or non-tumor using ensemble learning techniques based on pre-processing, segmentation, and feature extraction. As an alternative, [6] improved an Extreme Learning Machine (ELM)-based method to use when evaluating model performance by synthesizing cropped, uncropped, and segmented lesions of different dimensions of t1-weighted scans [16]. In [51], all traditional machine learning algorithms and the CNN methods enhanced for brain tumor detection were broadly compared including feature extraction and classification. Unlike the other studies, [58] described a brain tumor diagnosis algorithm, which included image segmentation based on the Unet architecture using advanced CNN architectures. In comparison to the previous approaches, this new strategy was built on VGG16 network architecture as the backbone of the Unet architecture. In particular, since clinical applications may include patients with different pathological tumor findings, the results should be accurate and obtained within a narrow time period. Advances in various experimental imaging techniques, including the early detection of brain abnormalities, are helping to increase the popularity of MRI as a complementary modality for treatment in the patients' clinical preparations for treatment. Another approach is to use an evolutionary algorithm to adapt the reinforcement learning to classify and detect brain tumors [60]. This involves two phases of pre-processing: freezing and fine-tuning. The significant features are then extracted from the MRI slices. However, this approach is not applicable when inaccurate segmentation has been done as it depends directly on the segmentation performance. As an alternative to the previously mentioned research, [4] found that a CNN technique based on the RELU-derived hard swish activation function could better extract CNN edge and texture features in order to detect cancerous tissue. Many powerful CNN models have recently accomplished perfect progress in computer-assisted clinical programs, including segmentation [9, 34], diagnosis, and the classification [8, 59, 69] of medical images using radiological data.

In terms of the weights, bias, and other learning parameters, ResNet50 [28], DenseNet201 [33], MobileNetv2 [31], InceptionV3 [66], and NASNet [75] artificial neural network layers are capable of high-level brain tumor identification and learning a large number of key features. One of the major drawbacks of these proposed systems is the small number of training data used to categorize brain tumors. Additionally, some CNN architectures such as VGGNet, KE-CNN, and ResNet50 may perform insufficiently in both tumor detection and classification tasks when determining the subtle alterations of the brain morphology on the MRI. Some of these differences include total brain volume, corpus callosum, increases in the total white matter volume, etc. These architectures can easily learn the underlying data patterns, which can lead to overfitting, poor generalization, and difficulty in interpreting the results. In this study, in order to address these drawbacks, we pre-processed the whole dataset before feeding them to the neural network. This step worked to prevent overfitting and under-fitting, which could affect the performance. Meanwhile, [67] proposed a novel EfficientNetv2 CNN family with lower training time and a better efficiency than previous models. This study shows that careful balancing not only in hyper-parameters but also network depth, width, and resolution in a CNN model leads to better performance. However, we provided a better balance of weight and bias by minimizing the error using a different optimizer, Ranger, in forward and backward propagation, to better extract the features of the tumor region. Based on the related studies, 4 state-of-the-art CNN models EfficientNetv1, ResNet18, ResNet200d, and InceptionV4 were included in this study to ensure diagnosis and multiple classifications of similar brain tumor diseases in addition to the EfficientNetv2 architecture.

3 Materials and methods

Recent years have witnessed fast development in CNN technologies. CNNs have become popular due to their high sensitivity and their ability to be used in a wide range of applications and research areas such as signal processing [1, 35, 47], pattern recognition [36], authentication systems [2, 29, 50]. CNNs are an advanced concept of artificial neural networks and have gained increasing attention due to both their learning stability and capability of processing images with varying quality for computer vision problems in many different fields. In our approach, we used a CNN-based system to diagnose 3 different malignant brain tumors using the EfficientNetv2 model powered by the state-of-the-art optimizer Ranger and fine-tuned pre-processing. At the time of publication, to the best of our knowledge, this idea has not been considered in the literature. Based on the achievement of CNNs in solving various complex tasks, this improved framework aims to achieve optimal multi-class brain tumor diagnosis by clarifying the missing details in MRI images against the restrictions such as noise, blur, and brightness. Figure 1 shows the flow diagram of the proposed system. The system includes the following stages: 1. Pre-processing, 2. Data generation, 3. CNN framework or deep feature extraction for tumor detection and 4. Diagnosis. In the first step, the image datasets were cropped and filtered against image impairments. The next process was data augmentation and the whole dataset was balanced using different transformation and noise invariance methods using different parameters. Just after splitting the dataset into training, validation, and test sets, the input was sent to the trained model on MR images for the detection of cancer.

Fig. 1
figure 1

Flow diagram of proposed system

3.1 Datasets

In this work, to ensure a less dataset-specific and more generalizable approach, two publicly-available datasets BR35H::Brain Tumor Detection 2020 (BR35H) [26] and t1-weighted contrast-enhanced MRI dataset [16] were chosen. The first dataset consisted of a total of 2768 different resolution MR images, of which 1458 images showed different tumor types and the remaining 1310 images showed healthy brains. The second dataset included CE-MR slices of 3064 t1-weighted meningiomas, gliomas, and pituitary tumors, and each MR slice had a size of 512 × 512 resolution (pixel dimensions 0.49 × 0.49 mm2). The used images in this study are illustrated in Fig. 2 and the tumor borders are highlighted in red.

Fig. 2
figure 2

The selected samples from t1-weighted MRI image dataset used in this study, a Tumor-free sample, b Axial tumor appearance, c Coronal side of meningioma, d Axial side of glioma, e Sagittal side of pituitary

3.2 The EfficientNetv2 architecture

In recent scientific studies, the EfficientNetv2 model has been preferred as a powerful tool (faster training speed and better parameter efficiency) in smart healthcare systems, as it is capable of handling medical image analysis successfully. It has been used for automatic tuberculosis diagnosis in chest X-ray images [3], breast cancer classification [62], and COVID-19 detection using X-ray and CT images [32].

Although the EfficientNetv2 architecture was developed the same way as EfficientNetv1, EfficientNetv2 is generally superior to EffiecientNetv1 in terms of parameters and floating-point operations per second (FLOP) efficiency. The FLOPs indicate the complexity of the model by measuring the number of transactions of a frozen CNN network. Due to EfficientNetv1 being trained with a large image size, it consumes a significant amount of memory. Since the total memory size in the Graphics Processing Unit (GPU) is limited, it is necessary to execute CNN models with a smaller batch size, which slows the training speed down considerably. One of the superior features of EfficientNet compared to the other models is its depth-wise convolutions [63].

In the EfficientNetv2 architecture, the original Inverted Residual Block (MBConv) in EfficientNet-B4 was replaced with Fused-MBConv in the first layers. In this architecture, a non-uniform scaling strategy was used to gradually add more layers to subsequent stages. Through progressive training, it initially provides low regularization by using small images during training which is then followed by high regularization. Through this process it achieves a positive impact on the width or depth expansion of the network and extracts the discriminative features. Thus, it achieves better model performance with fewer parameter sizes. This model, with shallow and faster CNN for image recognition, was developed by optimizing with a non-uniform expansion procedure. Then step-by-step extra layers were added to the next phase for sensitive training. In addition, the scaling rule was modified and the maximum image size was limited to a smaller value. The training speed increased while the parameter compatibility was maintained.

3.3 Comparison of Ranger with current optimizers

Optimization Function (OF) is a critical component in CNN networks and the performance of the optimization algorithm directly impacts the training efficiency of a model. However, the compatibility of different optimization algorithms with different CNN models can be achieved through the optimization of the cost function by adapting the neural network’s attributes such as learning rate, weights, and bias. To obtain optimal results from the network, we used the cutting-edge optimizer Ranger [68], instead of the default Adam optimizer [39], for the EfficientNetv2.

The Ranger optimizer combines two mechanisms: the Rectified Adam (RAdam) [44] and the stochastic optimizer LookAhead [49]. The RAdam was built by first determining an automatic warm-up mechanism using a rectifier function based on the actual variance encountered and then stabilizing learning at the beginning of training.

Based on the variance, the RAdam rectifies both the variance and generalization issues in the Adam stochastic optimizer's adaptive momentum. The main reason for these issues is that in the early steps of model training, the adaptive learning rate has an undesired large variance due to the limited amount of training dataset being used. Fundamentally, there is a close relationship between stability and variance.

By keeping an exponential moving average of the weights updated every 5 steps and replaced with the existing weights, LookAhead stabilizes learning and convergence during the rest of the training. LookAhead improves the network stability and reduces the variance of its inner optimizer with negligible computation and memory cost. Thus, it ensures a robust and stable breakthrough during the training period, as shown in Fig. 3. In CNN networks, the correct extraction of the most representative features depends heavily on the stability of the optimizer, which reducing the variance in the process. Therefore, it is possible to obtain higher accuracy rates by combining Ranger and EfficientNetv2.

Fig. 3
figure 3

Performance comparison of optimization functions (a) Training cost by iterations (b) Validation cost by iteration

4 Experiments

In this section, the experimental studies and a detailed evaluation of the improved system are discussed. Figure 4 illustrates the CNN-based tumor diagnosis system for brain tumors. During the study multiple CNN architectures and modern optimizers were trained and tested in order to identify the best-suited network. All networks were pre-trained using the ImageNet database. To construct the optimal model, we employed a fivefold cross-validation (CV) method, where each fold was divided into the subset (test: 10%, train: 80%, and validation: 10%) for every 10 iterations. We initially assigned the learning rate of the network as 0.001 and it was reduced gradually by a gamma factor of 0.5 every 20 epochs. All models were trained throughout 40 epochs with batch size 32 and implemented in Python v.3.7 on a platform containing a 2.0 GHz Intel Xeon CPU and NVIDA Tesla T4 13 GB GPU. Algorithm 1 and Algorithm 2 describes the training and testing algorithm of the CNN models and the pseudocode for the overall proposed framework, respectively. In the following subsections, the pre-processing, the input data generation, and CNN models are described in more detail. The result of each operation is statistically proven and graphically shown.

Fig. 4
figure 4

Overview of the CNN-based brain tumor diagnosis system

Algorithm 1
figure a

Training and testing algorithm of the CNN models

Algorithm 2
figure b

Proposed brain tumor diagnosis CNN model in healthcare

4.1 Pre-processing stage

For increasing the productivity and stability of neural networks, image pre-processing is generally considered an indispensable procedure for image-based systems such as CNN. However, it is a complicated and time-consuming procedure, which has to extract the most appropriate features for successful detection. Figure 5 illustrates the applied image-processing techniques used in this work, such as cropping, histogram equalization, and denoising.

Fig. 5
figure 5

Results of preprocessing with cropping, histogram equalization and denoising filter a Glioma, b Meningioma, and c Pituitary

In order to produce reliable results and to alleviate the overall computational effort, it is important to select the most effective pre-processing tasks on the available data. It should be observed that an image carries not only spatial information but also information about the interpolated noisy pixels, which causes significant performance degradation or poor learning conditions. Having a large number of noise-free, quality data is an important factor for CNN models to be able to learn effectively without overfitting and to increase the performance of the CNN model considerably [73].

4.2 Data generation

Data augmentation is a frequently preferred data balancing technique in CNN-based methods used to achieve reliable satisfactory results. Although it provides some technical benefits such as increasing the prediction accuracy of the model and reducing overfitting and underfitting, feeding the network with unrealistic data can introduce unnecessary information into the learning process. It also involves additional memory, transformation computation costs, and additional training time. Therefore, the use of data augmentation is limited in this study. In determining the characteristics of data augmentation, the selection of appropriate parameters involves a series of processes, which improve the resolution quality of training data so that steady deep networks can be constructed. After the original datasets were separated into training, validation, and testing, all MR images were augmented t to allow the CNN model to learn the invariant features. These features ultimately improved the model's performance and robustness, in addition to preventing the possibility of overfitting and underfitting. The selected augmentation techniques are rotation, zooming, horizontal flipping, vertical flipping, and height shifting for geometric transformation invariance. In Table 1, to produce better test accuracy rates, the existing dataset has been expanded with the given initial parameters.

Table 1 Data augmentation procedures with initial parameters

The augmented training dataset for tumor detection consists of 9342 images of tumor presence and 8649 images of healthy brain. On the other hand, 9252, 4554, and 6111 new images were generated of the glioma, meningioma, and pituitary tumor types. The statistics of MRI datasets for CNN models are given in Table 2.

Table 2 Statistics of MRI datasets

4.3 Experiments on EfficientNetv2 versions

The EfficientNetv2 family has 3 types of scales, small (s), medium (m), and large (l). In this study, it was first used to construct a brain tumor diagnosis system. Due to the parameter size of the EfficientNetv2l model being larger than the EfficientNetv2s and EfficientNet2m, it consumed the computer’s GPU resources at an extreme level. As given in Table 3, the size of the trainable parameters used by the models was 20.18 M, 52.86 M, and 117.23 M for s, m, and l, respectively.

Table 3 Comparison of EfficientNetv2 versions

As seen in the table, while the pre-trained EfficientNetv2l achieved the best accuracy scores by using learned parameters to the limit of the computer’s computational capability, the training time was much longer when compared to the others (27.68 min with augmentation). It was observed that the increase in model depth increased the training time significantly. The EfficientNetv2s has a fast convergence speed (11.15 min with augmentation) and stable training procedure with a satisfactory test accuracy score of 97.36% on the augmented dataset. Consequently, the EfficientNetv2 can adaptively adjust regularization (e.g. data augmentation) through image size and improve the training procedure using progressive learning. Therefore, when the depth of the model increases, the inputs are resized to 128 × 128 pixels to reduce computer memory consumption. The outputs of the tumor diagnosis systems were evaluated before and after data augmentation, which both improved the overall accuracy, learned invariant features, and reduced overfitting. With the above training strategy, the goal of each model was achieved, while the joint Ranger optimizer minimized the prediction loss.

4.4 Classification performance of optimizers

In the second experiment, we analyzed the competence of optimization algorithms on CNN performance and measured their variability and reliability in automatic classification. Meanwhile, using the Ranger optimizer, instead of the default optimizers, meant the CNN model could avoid overfitting problems and work to ensure a superior classification performance. To evaluate the optimizer, we compared the CNN using Cohen’s Kappa coefficient [18] and Hamming Loss metrics [23] for the 4 optimizers previously mentioned in Section 3.3. The detailed results of each optimizer are given in Table 4. The selected evaluation metrics are expressed in Eq. 1 to Eq. 7 [30].

$$Accuracy = \left( {TP + TN} \right)/\left( {TP + TN + FP + FN} \right)$$
(1)
$${\text{Precision}} = TP/\left( {TP + FP} \right)$$
(2)
$${\text{Re}} call\, = TP/\left( {TP + FN} \right)$$
(3)
$$F1 - Score = \left( {2 \times {\text{Precision}} \times {\text{Recall}} } \right)/\left( {{\text{Precision}} + {\text{Recall}} } \right)$$
(4)
$$Pe = \left( {\left( {TP + FN} \right) \times \left( {TP + FP} \right) + \left( {FP + TN} \right) \times \left( {FN + TN} \right)} \right)/\left( {TP \, + TN + FP + FN} \right)^{2}$$
(5)
$$Cohen^{\prime}s\,\,Kappa \, = \, \left( {Accuracy \, - \, Pe} \right)/\left( {{1 } - \, Pe} \right)$$
(6)
$${\text{Hamming}} \,\,Loss \, = \, \left( {\text{FP + FN}} \right)/\left( {FP + FN + TP + TN} \right)$$
(7)
Table 4 Classification performance of optimizers with EfficientNetv2s

In these equations, TP, TN, FP, and FN denote the true-positive, true-negative, false-positive, and false-negative values, respectively. Ranger showed notable improvements of 0.9770 ± 0.0094 Cohen’s Kappa, 0.0147 ± 0.0012 Hamming Loss, and 98.60% ± 0.14 in test accuracy compared to the SGD, RMSprop, and Adam, respectively. According to the training time, the SGD achieved the fastest convergence at 32.29 min and the Ranger converged slower than the other optimizers at 42.28 min.

4.5 Comparison with current CNNs

To assess the classification potential of the brain tumor diagnosis system among different CNN models, five types of CNN networks were trained and tested. The values of the metrics containing accuracy, precision, recall, and F1-score were calculated and expressed as mean ± standard deviation in C-category classification. Table 5 compares the various CNN model classification performances found in the 3-tumor type classification of independent test data as micro and macro statistics.

Table 5 Comparison of classification performances of different CNN models

As Table 5 shows, the average training time of every operation related to EfficientNetv2s + Ranger was 41.99 min with a standard deviation of 0.57 over 40 epoch. When completing the same operation the training time for the ResNet18, ResNet200d, and Inception V4 was 12.1, 102.4, and 54.5 min, respectively. The EfficientNetv2s + Ranger model showed a more superior performance than the other CNNs and achieved 99.85% test accuracy, without significantly increasing parameter size and computational cost in repeated tests. As seen in the table, the ResNet18 and InceptionV4 have lower accuracy values than the other CNNs. The ResNet18 has significantly reduced the number of parameters and has a satisfactory performance compared to the InceptionV4. While the ResNet200d is the largest model, the micro and macro scores show that at high training times it achieved almost the same accuracy as the other CNNS. These results indicate that the improved model enabled effective edge and texture learning of tumor characteristics through the use of accuracy and micro and macro statistical metrics. To further confirm the generalization performance of our system, the Receiver Operating Characteristic (ROC) curve and AUC values were also calculated for all models, which are shown in Fig. 6. The ROC curves for our improved model were slightly above other models, with an AUC value 0.9985.

Fig. 6
figure 6

ROC curve of the different CNN models on brain tumor classification

Confusion matrices are presented graphically to identify the discordance between the 3 classes and the CNN models in Fig. 7. While the EfficientNetv2 + Ranger model for glioma tumors outperformed the other models, the Inceptionv4 made an error in predicting only two samples for the meningioma tumor. The ResNet200d appeared to better separate for a pituitary tumor.

Fig. 7
figure 7

Confusion matrices of the DL models on the independent test data. a ResNet18, b ResNet200d, c InceptionV4, and d EfficientNetv2s + Ranger

4.6 Statistical comparisons

In this section, we used the Cochran Q test statistical method [17] to provide a comprehensive level of type I error and to build pairwise comparisons. It requires that there be only a binary matrix and that there be more than two classifiers of the same size. Figure 8 illustrates the binary comparisons between the models through the Cochran Q test, corresponding to the results presented in Table 5. Assuming that we chose a significance level of α = 0.05, most models in pairwise comparisons did not yield a statistically significant difference except for CV4. The EfficientNetv2 + Ranger showed a statistically significant difference according to both ResNet18 and Inception v4. This indicates that the clear difference may have been caused by the dataset.

Fig. 8
figure 8

Statistical comparisons between different CNN models according to the outputs of fivefold CV. Each mini square shows that the corresponding model was pairwise compared using the Cochran Q test (p > 0.05) to indicate whether there is a statistical difference between them. Blue specifies a statistical difference, orange specifies no statistical difference, and gray specifies the results are not appropriate

5 Results and discussion

In the first experiment, we compared three versions of the EfficientNetv2 family. The dataset with raw images was split into three sets with training, validation, and test. Next, all experiments were conducted on pre-processed augmented and non-augmented images. The hyperparameters were left with default settings, i.e., the learning rate was equal to 0.001, and gamma was set to 0.5 depreciate. All models were run on a fivefold cross-validation set and each training time was noted separately. The accuracy of the models was higher after the data was augmented. In all models, the training time of the models after augmentation was longer than the training time of the model before augmentation. The EfficientNetv2l achieved the highest accuracy of 97.72% using augmentation with the highest training time in the test set as reported in Table 3.

In the second experiment, SGD, RMSprop, Adam, and Ranger optimizers were used on the EfficientNetv2s architecture, respectively. Among these four optimizers, SGD showed poor convergence with 97.36% test accuracy. The Ranger showed slightly better performance using an adaptive momentum based on variance with 98.60% test accuracy, 0.9770 Cohen’s Kappa and 0.0147 Hamming Loss values for the test set as given in Table 4. However, the training time for the network was the longest.

In the third experiment, according to the evaluated and analyzed results using micro and macro metrics, the improved CNN model performed better than the selected state-of-the-art CNN models in diagnosing tumor diseases. These data are presented in Table 5. The ResNet18 and InceptionV4 performed similarly in the classification of test data with 99.62% and 99.69%, respectively. Although the ResNet200d architecture gave the closest scores to what we had obtained, it had the longest training time of 102.48 min due to its deep network structure.

The feature maps of the internal layers extracted from block0 to block3 are illustrated in Fig. 9 to further clarify the learning details of the EfficientNetv2s + Ranger and EfficientNetv2s network with the default optimizer Adam. In the first block, when we compared them for a glioma test sample, we observed that EfficientNetv2s + Ranger had better preserved edges (squared with red boxes) and focused gradients in the tumor region. It is also possible to distinguish this situation from the skull edges. However, EfficientNetv2s + Adam spread the edge details in gradient results. In the next blocks, it can be deduced that final models suppressed weak characterizations such as homogeneous regions in the layer outputs.

Fig. 9
figure 9

Visualization of feature maps of EfficientNetv2s+Adam and EfficientNetv2s+Ranger extracted from a block-0, b block-1, c block-2, and d block-3

Table 6 presents the dataset, the developed environment, a brief explanation, and the accuracy rates of the proposed system with other methods that bring different solutions to the problem. [14] tried to estimate the Isocitrate Dehydrogenase (IDH) mutation status of gliomas from MRI by conducting a residual CNN to preoperative radiographic data. The test accuracy of their model achieved 85.7%. One of the challenges in training CNN networks is that they require a large quantity of training data. In their study, a small amount of data sets from three different private hospitals were used and data augmentation was applied to avoid the model overfitting and underfitting. ELM is another type of learning technique built from one or more layers of hidden nodes. In [57], another CNN model consisting of some intermediate layers was studied to both normalize the data and create an effective mechanism.

Table 6 Detailed comparison of cutting-edge DL-based brain tumor diagnostic methods

The classification accuracy obtained by the KE-CNN model was 93%. [11] explored the performance of the three distinct deep CNN architectures trained on malignant intracranial MR images to discriminate between High-Grade Glioma (HGG) and Low-Grade Glioma (LGG). The VolumeNet was accomplished using the 3D volumetric dataset instead of 2D and while it has greatly improved findings with 97% classification accuracy, the large size of the model is a disadvantage. [45], inspired by ResNet34 architecture, designed a CNN model which mainly consists of a global pooling layer in addition to all-convolutional layers. They called their network G-ResNet and in their approach, the accuracy rate was measured as 95% due to the reduced number of parameters. [24] developed a new CNN model combining the typical trait groups to recognize IDH mutations and TERT promoter (pTERT) mutations. Simultaneously, the tumor lesion area was extracted from the image and normalized before they were sent to the AlexNet architecture. The model performance was measured with a final accuracy of 63.1% on a private dataset. Another brain tumor classification method using the Deep CNN (DCNN) architecture was experimented with in [10] using only 22 layers and patches extracted from whole MR images. Assuming that the hardware resources were limited, their shallow model can be appropriate in the case of narrow network bandwidths such as mobile phones, conventional PCs, or short response time. Meanwhile, in [46], a GAN-based model using ResNet50 on public and private datasets was implemented to further exhibit the feasibility and productivity of diagnosing IDH status of glioma tumors. The authors emphasized that IDH is a significant biomarker for glioma and ultimately results in better test accuracy of 88%. A wealth of research has considered modeling the detection and identification of brain tumors and therefore [51] prepared noteworthy prospective research that tries to solve this issue based on recent AI. In this survey, the accuracy rates of the two models generated by [61] and [15] based on VGGNet were 93% and 94%, respectively. The approaches given in [7, 22], and [25] classify t1-weighted contrast-enhanced MR images. [22] discusses a multi-scale deep architecture that allows application at three spatial scales through different procedures. However, this idea is self-diagnostic after segmentation and it causes the model to be more computationally complex and impractical for medical applications due to the increased amount of errors. Rather than handling segmentation, [7] showed that tumor type can be estimated from optimized GoogleNet and ResNet101 with transfer learning. They attained the highest performance of 99.33% in tumor detection and 95.65% in tumor diagnosis. More recently, ensemble learning methods for diagnosing brain tumors have been able to deal with more reliable feature extraction from MR images in the case that small amounts of brain tumor data are available. First, brain tumor recognition was performed using a modified InceptionResNet-pretrained model in [25]. In the second phase, in the case when a tumor lesion was detected, the tumor type was specified using the combination of InceptionResNet and Random Forest Tree (RFT). The last line in the table gives the accuracy of our methodology. Among all of the methods for tumor detection, the test accuracy of our method gave almost the same result compared to [25] at 99%, the EfficientNetv2 + Ranger yielded the best performance for tumor classification at 98.85%.

Although the proposed CNN-based approach can be used as an alternative to the existing methods of brain tumor diagnosis, there are some limitations to its performance. These depend on, among other factors, the dataset used in the training, the limitation of MRI scanning, and the deep learning methodology involved in training such systems. Two publicly available datasets were used in this study, thus some clinical information about patients was limited. To improve the robustness of the CNN models, future multicenter studies with more detailed clinical data including patient datasets are required. Additionally, because the radiological features of primary tumors, metastases, and central nervous system cysts are similar, conventional MRIs are not precisely reliable to discriminate between them. Therefore, additional examinations such as computer tomography and biopsy are required for the exact diagnostics. CNN-based methods also have structural limitations. For example, some problems such as misclassification may be encountered in the application of CNN models, thus they may be ineffective in making the correct diagnosis of brain tumors and may be expensive in terms of computational costs.

6 Conclusions

In cases with high or weak contrast in MRI brain images, there may be uncertainty when deciding the presence of a tumor. Computer-aided diagnostic systems can support clinicians and radiologists in distinguishing between tumors and healthy cells. With radiological imaging, the presence of cancer and similar diseases is observed, and information about its size and location is obtained. With the help of AI-based solutions, reliable clarification of the diagnosis of tumor tissues with high similarity can be achieved using algorithm innovation. We presented an end-to-end CNN-based brain tumor classification system that represents a powerful approach for meningioma, glioma, and pituitary tumors based on MRI in neurosurgery clinics and demonstrated the following findings:

  • We conducted the EfficientNetv2 + Ranger model, which had not been previously considered in the literature, and we achieved 99% test accuracy in tumor detection and 99.85% test accuracy in predicting meningioma, glioma, and pituitary tumors.

  • In CNN networks, applying extensive pre-processing on the raw dataset exhibited significantly improved learning capabilities.

  • The Ranger algorithm provides robust and stable training throughout the training period in different CNN architectures by reducing the variance.

  • By performing a wide range of experiments on tumor datasets, we examined recent CNN models and optimizers and emphasized convenient choices for recognition in smart healthcare.

This study stated that AI-based methods can support radiologists in validating their initial scans of brain tumors for multi-classification purposes. Unfortunately, for the indicated contributions the available datasets were obtained retrospectively by the dataset provider and the clinical diagnoses of the patients were not exactly confirmed. In a case where the patient has different symptoms, the detection of a brain tumor can change the diagnosis and treatment planning for clinicians and medical pathologists. In addition, the use of CNN for tumor diagnosis requires the identification of features such as shape, size, location, and extensive abnormalities in various directions of tumors, rather than the handcrafted features. Another challenge is that in this problem, it should be noted that the implementation of CNN models can be ineffective and computationally expensive.

Despite these adverse conditions, the cutting-edge AI-based solutions have made great progress in identifying cancers in recent years. In this way, the proposed method encourages the development of current patient diagnosis techniques by including CNN-based techniques in the use of the health profession. This study paves the way for incorporating CNNs into medicine by significantly increasing the rate of cancer recognition.

We will refine the diagnostic output by considering the 3D spatial information among the brain MR slices using a lightweight CNN architecture, inspired by some recent studies using radiomic analyzes [37, 42, 65]. We also plan to collect information on the clinical features of a custom dataset so that we can use the data from patients suffering from brain tumors in CNN networks as complementary discriminative features in our future works.