Introduction

Magnetic resonance imaging (MRI) has become an invaluable tool in contemporary diagnosis and treatment planning, particularly for brain tumors. Its non-invasive capability to capture images in multiple planes and provide highly detailed images of soft tissues, with sensitivity and superior image contrast, has revolutionized the field of medical imaging [1]. Radiologists utilize these images to analyze, detect, and classify potential tumors in patients. However, this process is inherently time-consuming and demands a high level of medical expertise, making it a formidable challenge. Machine learning has emerged as a promising solution to address these needs, offering the potential for faster diagnosis and higher accuracy.

This study is a seminal exploration of brain tumor classification, harnessing the potential of MRI images as an interpretive canvas through two distinct yet synergistic approaches: shallow neural networks enriched by preprocessing methodologies and deep neural networks elegantly fine-tuned to uncover the nuances of brain neoplasms.

The initial phase of this research focuses on shallow neural networks, which require a preprocessing routine for feature engineering. Principal Component Analysis (PCA) proves to be a valuable preprocessing technique, commonly employed in image processing with artificial neural networks for dimensionality reduction, feature extraction, and noise reduction [3]. PCA is applied to transform the original image data into a lower dimensional representation while retaining the most critical information.

The second approach focuses on using deep neural networks to classify brain tumors. It is boosted by ResNet50 architecture and uses fine-tuning to recalibrate the network’s learned parameters. This ensures that the deep network works well with MRI images and can accurately classify brain tumors. The recalibration is done through iterative gradient descent. This approach combines computational power with diagnostic accuracy.

In the following sections, we delve into the development of each approach for classifying brain tumors using MRI scans. First, we explore the design of the neural networks and their application in computational analysis. Subsequently, we compare the effectiveness of shallow and deep network methodologies through empirical assessments. We aim to understand the techniques evolution of brain tumor classification comprehensively. Overall, this study explores the field of medical imaging from a computational perspective.

Related Work

In the medical domain, the usage of digital images for diagnosis is growing, since effective brain tumor treatment depends on early detection in patients. MRI is a non-invasive scan that produces detailed images of the inside of the body; it provides high contrast in soft tissue, which is useful for detecting and diagnosing brain tumors and other lesions [18]. Given the complexity of the brain tumors, under MRI, the recognition and segmentation of these images is not a trivial task; it requires highly qualified staff to interpret them and it is time-consuming [4].

Computer-aided methods based on MRI data for brain tumor classification can be broadly grouped into semi-automated and fully automated methods [2]. The first extract manually features to become then the input to classifiers, and the others make the whole process based on deep learning using algorithms such as convolutional neural networks (CNN) [12].

This task is exceedingly difficult to make a reliable tumor detection due to the appearance, changing size, form, and structure of brain tumors [2]; therefore, numerous improvements are still needed to segment and categorize the tumor region effectively [16]. Even though tumor segmentation algorithms have demonstrated tremendous promise in evaluating and detecting tumors in MR images, classifying healthy and unhealthy images and identifying tumor regions’ substructures have limitations and difficulties [13].

Therefore, some studies have used different ways to approach this task: to mention some, support vector machine (svm), bag-of-words, Fisher vector, k-nearest neighbor (KNN), fuzzy c-means, self-organizing maps, artificial neural networks such as multilayer perceptron and convolutional neural networks, the latter by either designing the neural network architecture from scratch or by applying fine-tuning [1, 10].

In addressing this problem, our approach is informed by two key research. The first study [6] provides a vital foundation, as it introduces the shared dataset and employs it for analysis, focusing on the automatic classification of tissue types within the region of interest (ROI) in T1-weighted contrast-enhanced MRI (CE-MRI) images. Specifically, the study delves into classifying three types of brain tumors: meningioma, glioma, and pituitary tumor. Rather than relying solely on the original tumor region, they incorporate an augmented tumor region through image dilation as the ROI. The augmented tumor regions are partitioned into increasingly fine ring-form subregions (up to three levels), capturing the tumor and surrounding tissues and providing essential clues for tumor classification. Unlike the first study, where the images underwent partitioning treatment to capture surrounding tissues, we worked with the original images. This decision aimed to explore the capabilities of an artificial neural network for classifying the tumor type from conventional processing images.

The second research focuses on brain tumor classification using the same dataset of contrast-enhanced MRI (CE-MRI) images [14]. The study’s key innovation lies in its approach, utilizing pre-trained Convolutional Neural Networks (CNNs) and transfer learning. They implement a block-wise fine-tuning strategy that systematically delves deeper into earlier blocks of the CNN architecture, resulting in an enhanced classification method. To ensure the robustness of their findings, the researchers conducted extensive experiments and employed a fivefold cross-validation test.

In this study, we employ both shallow and deep neural network approaches to compare their performance in a multi-class classification task using a real dataset. This dataset, like many in the field, presents typical challenges, including a limited number of examples, imbalanced class distributions, high dimensionality, lack of segmentation, and the inherent difficulty of distinguishing between classes

Exploratory Data Analysis

Dataset

The “Brain Tumor Data set” was posted in 2017 by Jun Cheng [5]. It contains 3064 T1-weighted contrast-enhanced images, in “.png” and “.mat” format, from 223 patients with three kinds of brain tumors. Figure 1 shows the types of tumor which are meningioma with 708 slices with label 1, glioma with 1426 slices with label 2 and pituitary tumor with 930 slices with label 3. The image has four bands and its resolution is 512 \(\times\) 512 pixels. This data set is split into two groups, one for training-validation and the other one for testing.

Fig. 1
figure 1

Types of tumors in the data set, meningioma (left), glioma (center), and pituitary (right)

We started by exploring the number of images per class in the dataset. It had 708 images for meningioma tumors, 1426 images for glioma tumors, and 930 images for pituitary tumors. This is important, because the same number of images per class is needed to train the model to avoid an unbalanced model.

Fig. 2
figure 2

Four channels of an example per class

We then analyzed the dimensions of each image to examine whether all images for the three classes had the same resolution and number of channels. This revealed that the 708 images of the meningioma class had a resolution of 512 \(\times\) 512 pixels with 4 channels. The 1426 images of the glioma class have a resolution of 512 \(\times\) 512 pixels with four channels. Finally, the 915 images of the pituitary class had a resolution of 512 \(\times\) 512 pixels with 4 channels. However, 15 of the images in the pituitary class had 256 \(\times\) 256 pixels with four channels. Therefore, all the images needed to be resized to the same 512 \(\times\) 512 resolution.

Fig. 3
figure 3

Histogram of colors per channel with 500 images per class, meningioma (top), glioma (middle), and pituitary (bottom)

Each channel is visualized using one example per class to identify the tumor zone and determine which channel provides more details for effective separation from the rest of the image, enhancing contrast. Figure 2 illustrates that the first and second channels offer substantial information about the tumor. The first channel represents the dark zone, while the second channel represents the bright zone. In contrast, the third channel contributes limited information about the tumor, and the fourth channel does not provide any relevant information about the tumor.

Histograms of colors per channel were constructed to assess potential relationships between channel type and class. These histograms were obtained from 500 images per class. Figure 3 shows all these histograms, with the three histograms on the top representing the meningioma tumor for the first to the third channel. The next three represent the glioma tumor, and the bottom three represent the pituitary tumor. As shown in the figure, the images have some patterns, but the frequency of pixels changes. The fourth channel is omitted, because it has a value of 255 for all images.

Data Preprocessing

The initial step involved resizing 15 images in the pituitary class from 256 \(\times\) 256 to a resolution of 512 \(\times\) 512 to maintain consistency and minimize noise in the model. As previously demonstrated, the first and second channels contain more information for pinpointing the tumor’s location. In Fig. 4, various combinations of the first three channels are depicted. The first row emphasizes the second channel, while the second row prioritizes the first channel. For this particular model, we opted for a weighted combination, with values of 0.1 assigned to the first channel, 0.8 to the second channel, and 0.1 to the third channel.

Fig. 4
figure 4

Some combinations of the three channels

Figure 5 presents three distinct views from an MRI. The first view displays the image using all four channels, the second view shows the image using only the first channel, and the third view is the first-channel image with added contours.

Fig. 5
figure 5

Reduction of channels and visualization with contours

An additional reduction was implemented to reduce the dimensionality and extract the most discriminative features from the data. Initial visualization of the reduction was conducted to evaluate the data’s behavior, employing the PCA algorithm. Figure 6 illustrates the cumulative variance curves with respect to the number of principal components. The first curve represents the meningioma class, where the first 250 components contributed to a variance of 90.77%, and the first 400 components accounted for a variance of 95.77%. The second curve pertains to the glioma class, wherein the first 400 components were responsible for a variance of 90.42%.

Fig. 6
figure 6

Cumulative curve of variance with PCA to each class, meningioma (upper left), glioma (upper right), and pituitary (bottom left) and cumulative curve of variance with PCA to all data set (bottom right)

Fig. 7
figure 7

Visualization of reduction with PCA. The first row shows the reduction and the second row shows the original one-channel image

Fig. 8
figure 8

Pseudo-images of 40 × 25 pixels to train-validate-test the model

The entire dataset underwent dimensionality reduction to reduce computational complexity and noise. Figure 6 displays the cumulative variance curve concerning the number of principal components. As observed in this curve, the first 550 components accounted for 90.45% of the variance, while the first 1000 components represented 94.89% of the variance. A selection of 1000 components was made for the model, generating pseudo-images with a resolution of 40 \(\times\) 25 pixels. Figure 7 visualizes the representation of the images post-PCA. Notably, these images retained 94.89% of the variance, with the first row displaying images at this variance level and the second row showing the original one-channel images. Additionally, to assess how the images would contribute to model training, validation, and testing, a visualization of the 40 \(\times\) 25 pixel pseudo-images is presented in Fig. 8. Consequently, the initial dataset with 512 \(\times\) 512 \(\times\) 4 (1,048,576) parameters underwent preprocessing, resulting in data with 1000 parameters employed for model training, validation, and testing.

Approach Based on a Shallow Neural Network

Deep neural networks often outperform shallow networks in tasks, such as image recognition, speech recognition, and natural language processing. Shallow neural networks are often easier to interpret, as the relationships between inputs and outputs are more straightforward and can often be trained more quickly than deep neural networks.

MLP Model Design

Since the data were unbalanced, the number of images per class need to be the same for the three classes in the training and validation sets. As the smallest class was the meningioma tumor, the split of the three sets (training, validation, and test) was designed based on this class.

Training–validation–test split. The first step was to specify the percentage of data for the test set. The data for the training/validation process were based on the smallest class; thus, the number of images per class in the test set was different. Therefore, 90% of the data was used for the training–validation process, and the remaining 10% was used as the test set. The 90% training/validation data were split into the train and validate settings. The validation was set at 20%; therefore, the training set had 509 images per class, and the validation set had 128 images per class.

Table 1 Architecture of the multilayer perceptron model

A multilayer perceptron model was designed for this classification task. The architecture of this shallow network, as shown in Table 1, had four layers consisting of 1000 neurons on the input layer: a first hidden layer of 150 neurons with a regularization of dropout of 35%, a second hidden layer of 20 neurons with a regularization of dropout of 30%, and an output layer of 3 neurons.

Results of the MLP Model

The training/validation process was done with the following hyperparameters: 30 epochs, a batch size of 32; the optimizer was Adam, and a learning rate of \(5e^{-4}\). The curves of accuracy and loss during this process are shown in Fig. 9. The accuracy curve for the validation set increased to 91–93%, while the accuracy curve of the training set increased to almost 100% at the end. Furthermore, as shown in the loss curves, the validation performance starts to increase slightly around 0.3, as the training performance starts to decrease. Therefore, the training process was stopped then to avoid over-fitting.

Fig. 9
figure 9

Accuracy and loss for the training–validation process

The test set for this model was built with 1,153 images, of which 71 were meningioma tumors, 789 were glioma tumors, and 293 were pituitary tumors. Once the model was trained, this test set was passed to determine the model’s accuracy for unseen data. Figure 10 shows a confusion matrix to assess how the model performs, with an accuracy of 86.82%.

Fig. 10
figure 10

Confusion matrix for the test set

As shown in Fig. 9, more mistakes were presented in the misclassification of glioma tumors as meningioma tumors. Misclassification could occur because of the size and location of the two tumors. Nevertheless, this can be further explored by changing the amount of data and even the combination of channels. Other preprocessing methods, such as segmentation, may also be applied. Finally, data augmentation can be used to obtain a higher variance, which may allow the model to further train and perform better and more accurately.

Approach Based on a Deep Neural Network

Recent works on brain tumor diagnosis through MRI have reported improved results using deep learning, which refers to using artificial neural networks (ANN) with several layers [12]. The most emblematic one is CNN, which has shown excellent performance in machine learning problems, especially, in image data processing [9]. Their multiple layers include convolutional, non-linearity, pooling, and fully connected layers. Unlike the previous approach, a CNN omits the handcrafted feature extraction that a shallow neural network requires, thanks to several adjacent pairs of convolutional and pooling layers [8]. However, training CNN architectures from scratch requires a lot of annotated data, high computational processing, and exhaustive hyperparameter tuning. A promising alternative to working with CNN is fine-tuning, a technique that allows initializing model weights from a pre-trained model [15, 19].

Classification with a CNN

The primary objective in the second phase was to construct a deep learning model capable of accurately recognizing and classifying brain tumors into meningioma, glioma, and pituitary categories from MRI images. To accomplish this, a straightforward CNN model with three convolutional layers, accompanied by max pooling and downsampling of the output convolution feature maps, was employed to automatically extract features from the images.

Table 2 CNN model summary

The layered architecture is detailed in Table 2, which is derived from the CNN model summary provided by the Keras API. The dataset was divided from the MRI dataset, consisting of 2,464 images for training, 100 for validation, and 100 for testing, with equal representation across the three categories.

Fig. 11
figure 11

Accuracy of the training–validation process of CNN

In Fig. 11, the accuracy of the CNN training and validation process is visualized. The training curve illustrates the progressive enhancement in accuracy as the network adjusts its weights. Meanwhile, the validation precision curve gauges the CNN’s performance on an independent validation dataset, displaying an accuracy surpassing 90%.

Resnet50 Fine-Tuning Model

Fine-tuning is a machine learning technique in which a pre-trained model, often trained on a large dataset, undergoes further training on a specific, smaller dataset tailored to a particular task. This technique is commonly employed in transfer learning, where a model leverages its existing knowledge to excel in new tasks, resulting in significant time and resource savings compared to training a model from scratch [17].

Table 3 Fine-Tuning of ResNet 50 Model

One of the frequently used pre-trained models for transfer learning is ResNet50, which was originally developed by He Kaiming et al. in 2015 for the ImageNet competition [7]. This model boasts 50 layers, consisting of 48 convolutional layers, one MaxPool layer, and one average pool layer. Its most noteworthy characteristic is the residual stacking block, which significantly improves the gradient backpropagation process. Given its impressive performance in image classification tasks, it was a logical choice for our objectives, albeit with a fine-tuning approach [11]. Consequently, after selecting the appropriate layers and fine-tuning hyperparameters, the model was unfrozen and trained for the specific brain tumor classification task. Through a series of experiments, we identified the most effective architecture for this task. The optimal configuration included the ResNet50 block, followed by a dense layer with 512 neurons, culminating in the classification layer with three neurons, as illustrated in Table 3.

In the initial stage of the fine-tuning process, there were 1,050,627 trainable parameters. During this phase, the model’s architecture was devised, data augmentation strategies were determined, and the best-performing configuration from the subsequent cycle was saved. The second phase of fine-tuning involved a substantially larger number of parameters, totaling 24,585,219, since the ResNet50 model was unfrozen.

Fig. 12
figure 12

Accuracy and Loss of the training–validation process

Fig. 13
figure 13

Confusion matrix on the test set

The overview of the training and validation process is presented in Fig. 12. The dataset was partitioned as follows: 80% allocated for training and validation, with 20% of the 80% reserved for validation. The remaining 20% constituted the test set. Consequently, the training set contained 1961 images, comprising 453 from the meningioma class, 913 from the glioma class, and 595 from the pituitary class. The validation set included 490 images, with 113 from the meningioma class, 228 from the glioma class, and 149 from the pituitary class.

Finally, Fig. 13 resumes in a confusion matrix the performance of the model. The test set had 613 images, of which 142 were from the meningioma class, 285 were from the glioma class, and 186 were from the pituitary class. Flipping the images from left to right with some random saturation, the model was run by changing the learning rate to determine the best method for this task.

Conclusions

The first approach involved an exploratory analysis of the MRIs, where histograms were used to assess data distribution across classes and channels. In the preprocessing phase, we resized the images to match the dimensions of the smallest images and conducted image feature extraction through PCA. These simple operations significantly improved the MLP’s classification performance. Although we employed only these basic techniques, the experimental results were promising. The second approach encompassed the development of a CNN model from scratch and, subsequently, a deep network based on the ResNet50 model through fine-tuning. Working with pre-trained models allowed us to achieve strong classification results with a small dataset in less time, in contrast to MLP and CNN models built from the ground up.

Given the limited training data and the repetitive instances across each epoch, all neural networks in this study were susceptible to over-fitting. However, we successfully mitigated over-fitting in the classification model through regularization techniques, specifically dropout. This study yielded classification accuracies of approximately 97% (\({\pm }\)0.4%), surpassing results reported by other authors who worked with the same dataset.

As further work, it would be useful to have a control set to verify the accuracy of the model to discriminate healthy brains from brains with tumors, and to specify the type of tumor present in the brain.