Introduction

Brain tumors are masses that arise as a result of irregular brain cell proliferation and the loss of the brain's regulatory systems. Approximately 700,000 people worldwide are suffering from brain tumors, with 86,000 new cases identified in 2019. Nearly 16,380 people died in 2019 from brain tumors [1]. Gliomas are the most prevalent malignant tumors [2], representing 80% of all malignant brain tumors. Early identification of brain tumors is crucial in the diagnosis of cancer as it aids in the selection of the most appropriate treatment strategy to preserve a patient’s life.

However, the detection and segmentation of tumors based on imaging and human interpretation is challenging, as tumors can have a mixture of low-grade and high-grade characteristics. In addition, this is often time-consuming and does not always yield high accuracy [2]. A non-invasive computer-aided (CAD) fully automatic diagnostic method might aid clinical specialists in early diagnosis, analysis and treatment planning and decrease the mortality rate by providing more consistent, faster and more correct tumor identification [3]. For this reason, a number of CAD-based image segmentation methods are currently being investigated for this application [4]. The advance of CNN has led to noteworthy advancements on the field of computer vision and different brain tumor segmentation techniques based on deep learning [5].

In this study, we concentrate on tumor segmentation which is regarded as one of the most difficult challenges in multimodal MRI images. The multimodal Brain Tumor Segmentation Challenge 2020 dataset (BraTS 2020) has been used for this research. Four different MRI modalities are included for each patient in the dataset with corresponding manually segmented region of interest (ROI). The main contributions of this research can be stated as follows:

  • A convolutional network (U-net model) is designed for relatively quick and accurate image segmentation. In the ISBI challenge, it outperformed the previous best technique for segmenting neuronal structures. The architecture is configured with skip connections to boost performance.

  • Instead of employing fixed hyper-parameters, a novel approach is presented to develop the model architecture. In this regard, the model is optimized with ablation study where a number of experimentations are conducted by altering different hyper-parameters.

  • Rather than training the model with 3D images, we have employed single slice of the 3D MRI as the aim of our approach is to minimize the computational cost while achieving the best possible performance.

  • In the pre-processing step, the middle slice of 3D MRI is extracted automatically and a normalization approach is carried out which impacts the overall performance.

  • The model is trained and validated on each of the four sequences separately to determine which sequence results in the highest accuracy.

  • Across different hyper-parameters, the model is able to yield optimal performance which further validates the robustness and performance consistency of the architecture.

Literature Review

Various innovative approaches for automated segmentation of brain tumor have been presented in recent years. Pereira et al. [6], presented an automated segmentation approach using a CNN model. Their approach was tested on the BraTS 2013 dataset, achieving the place with dice coefficient (DC) measures of 88, 83 and 77% for the entire, core and enhancing areas, respectively. Other authors [7] presented a computerized system that can distinguish between a normal and an irregular brain based on MRI images and classify abnormal brain tumors as high-grade glioma (HGG) or low-grade glioma (LGG). With accuracy, specificity and sensitivity scores of 99%, 98.03% and 100%, respectively, their system effectively identified HGGs and LGGs. Noori et al. [8] developed a low-parameter-based network 2D U-Net that uses two different approaches. Using the BraTS 2018 validation data set, they got dice scores of 89.5, 81.3 and 82.3% for whole tumor (WT), enhancing tumor (ET) and tumor core (TC), respectively. Xu et al. [9] enhanced the performance of the segmentation of tumors with a 3D masked U-net multi-scale which captures features by assembling multi-scale architecture images as input, integrating a 3D atrous spatial pyramid pooling layer (ASPP). They obtained dice scores of 80.94, 90.34 and 83.19% on the BraTS 2018 validation set for ET, WT, and TC, respectively. On the BraTS 2018 test data set, this technique achieved dice scores of 76.90, 87.11 and 77.92% for ET, WT, and TC, respectively. Other authors [10] also used the BraTS dataset to evaluate a U-Net model, undertaking two experiments to compare the network's performance. Comparing the results of the first and third sets, the accuracy in detecting HGG patients was 66.96% and 62.22%, respectively. For LGG, the second and third set’s accuracies were 63.15% and 62.28%, respectively. Pravitasari et al. [11] proposed a cross between a U-Net and a VGG16 model. It was found that the model was able to detect the tumor region, with an average CCR score of 95.69%. SegNet3, U-Net, SegNet5, Seg-UNet, Res-SegNet and U-SegNet have been used on BraTS datasets in this study [12]. The average accuracy of Res-SegNet, U-SegNet, and Seg-UNet was recorded as 93.3%, 91.6% and 93.1%, respectively. According to Pei et al. [13], feature-based fusion methods can predict a better tumor lesion segmentation for the total tumor (DSCWT = 0.31, p = 0.15), tumor core (DSCTC = 0.33, p = 0.0002), and the enhanced tumor (DSCET = 0.44, p = 0.0002) areas. However, in the real segmentation of WT and ET (DSCWT = 0.85 ± 0.055, DSCET = 0.837 ± 0.074, the approach provided a statistically significant improvement. Lin et al. [14] proposed a 3D deep learning architecture named context deep-supervised U-Net to segment brain tumor from 3D MRI scans. The suggested approach acquired a DSC of 0.92, 0.89, and 0.846 on WT, TC and ET, respectively. Punn et al. [15] proposed a 3D U-net model for brain tumor segmentation using 3D MRI across WT, TC and ET lesions, respectively. The proposed deep learning model performed best achieving the highest DSC of 0.92 on WT, 0.91 on CT and 0.84 on ET. In both of these studies, data normalization was conducted as a pre-processing step. Ullah et al. [16] proposed a 3D U-Net model for the automatic segmentation of brain tumor using 3D MRI scans. A number of image pre-processing techniques were conducted to remove noise and enhance the quality of 3D scans. The suggested method achieved mean DSC of 0.91, 0.86, and 0.70 for the WT, TC and ET, respectively. Table 1 represents the main methodology and limitations of some of these papers.

Table 1 Methodology and limitations of the previous literatures

It is observed from Table 1 that in most of the papers, a lack of image processing and model hyper-parameter tuning exists. A few studies used 3D CNN model which is computationally extensive due to 3D convolutional layer. As 3D brain MRI comprises multiple modalities, these modalities should be explored in experimenting with deep learning models for automated segmentation which is another limitation found in some papers. In this study, we attempt to address these drawbacks in order to develop an optimal U-net model with the highest possible performance.

Dataset

Our system is trained and analyzed with the (BraTS2020 dataset, collected from Kaggle [17]. A total of 473 subjects of 3D images are available in the dataset. For every patient, four MRI sequences: fluid attenuated inversion recovery (FLAIR), T1-contrast-enhanced (T1ce), T1-weighted (T1), T2-weighted (T2) as well as the corresponding ROI (seg) are obtained. The provided ground truths were labeled by the experts. Each 3D volumes, includes 155 2D slices/images of brain MRIs collected at various locations across the brain. Every slice is 240×240 pixels in size in NIfTI format and is made up of single-channel grayscale pixels. The dataset is summarized in Table 2.

Table 2 Dataset description

Visualization of 3D MRI Scans

A 2D image consists of single or multi-channel pixels whereas 3D images comprise 3D cubes or voxels. While reading a NIfTI file, all the information in the file is encoded, where each detail is known as an attribute. When visualizing a 3D image (Fig. 1), a list is initialized in which, whenever a volume is read, it iterates over all the 155 slices of the 3D volume to append each slice sequentially in the list. The number of voxels in a 3D image is calculated with Eq. (1).

$${V}_{t}={S}_{t}\times {H}_{s}\times {W}_{s},$$
(1)

where \({V}_{t}\) is the total voxel number of an image, \({S}_{t}\) is the number of 2D slices of a 3D image, \({H}_{s}\) is the height of each slice and \({W}_{s}\) is the width of each slice. Figure 1 illustrates the visualization of a 3D MRI.

Fig. 1
figure 1

Visualization of 3D brain MRI

Description of MRI Sequences and Corresponding ROI

The FLAIR sequence is generated with a very long time to echo (TE) and repetition time (TR) and abnormalities, such as edematous tissues are found bright, while the normal cerebrospinal fluid (CSF) fluid remains comparatively dark. In T1-weighted MRI sequences, TR and TE are both kept shorter with the result that both tumor regions and CSF appear dark. T2-weighted sequences are produced using long TE and TR times, which make both tumor and CSF bright. In the T1ce sequence, tumor and CSF both appear dark. All the sequences therefore contain different characteristics, and it is useful that all of these sequences are used for training to determine which sequence results in the best segmentation performance in terms of accuracy. 2D slices of all the sequences are shown in Fig. 2 for a single subject.

Fig. 2
figure 2

2D representation of all four sequences and segmented ROI of a 3D MRI

Proposed Methodology

In the data pre-processing step, normalization and rescaling are implemented on each of the four MRI sequences. Afterwards, every sequence is passed as the training dataset into the model. The corresponding manually segmented ROI (Seg) is employed as ground truth for each of the sequences separately. Adam optimizer with a learning rate of 0.001 is used for training the model. For four sequences, four results are produced namely res-1, res-2, res-3 and res-4 (Fig. 3). Finally, based on the accuracy for each particular sequence, the optimal resulting model is obtained.

Fig. 3
figure 3

Proposed methodology

Data Preprocessing

Brain tumor classification using 3D MRI scans is usually difficult and computationally complex, requiring pre-processing techniques to improve the model’s performance [18, 19]. In this research, two pre-processing steps are performed for every MRI sequence before training the model: intensity normalization and rescaling.

Normalization

As MRI intensities differ based on manufacturers, procurement parameters, and different sequences are taken over different periods, the 3D images need to be normalized before feeding them to a model. As scanning of patients is likely to be performed in different lighting environments, intensity normalization plays an imperative role in brain tumor segmentation.

Data normalization means transforming floating-point feature values from their regular range into a new arbitrary standard range that is usually between 0 and 1 [21]. Min–max normalization is a widely used technique for normalization that is performed in our research on each MRI sequence independently. In this process, for each feature, the minimum value is converted to 0 and the maximum value is converted to 1. The remaining values are transformed into a range of values between 0 and 1. Normalization is performed using Eq. (2).

$$zi=\frac{pi-\mathrm{min}(p)}{\mathrm{max}\left(\mathrm{p}\right)-\mathrm{min}(\mathrm{p})},$$
(2)

where \(p\) refers to the pixel values means \(p\) = (\(p1\),…, \(pn\)) and \(zi\) refers to \(i\) th resultant normalized data.

Rescaling

Due to the GPU memory limitation, after normalization, the dataset is resampled to 128×128×1 voxels [9]. Only the middle single slices are utilized instead of all 155 slices of the brain where the original dimensions of 3D brain MRI image are 240 × 240 × 155 pixels. As experimenting with 3D data requires high computational resources, we use Intel Core i5-8400 Processor, NVidia GeForce GTX 1660 GPU, 8 GB of Memory, and 256 GB DDR4 SSD for storage.

Data Split

Each of the preprocessed sequences and corresponding ROI images is split into two sets (training and validation) maintaining a ratio of training to validation of 70:30. The segmented ROI images (seg) are used as training and validation labels while training and validating the model. The diagram in Fig. 4 is shown to provide better insight.

Fig. 4
figure 4

Dataset split

Proposed Architecture

U-Net Architecture

For the task of segmenting tumor regions from MRI images, the U-Net segmentation model is used in this research which is a widely accepted architecture [20], developed by Olaf Ronneberger et al. [21] for the purpose of segmenting Biomedical images. U-Net utilizes a compact encoder–decoder structure that concatenates the features from many encoder levels with the decoder layers twice [22]. Figure 5 illustrates the U-Net architecture.

Fig. 5
figure 5

U-Net model architecture

U-Net consists of a total of 23 convolutional layers and is made up of two paths: a contraction path (the encoder) and a symmetric expanding path (the decoder) (Fig. 5). The encoder or contraction path captures image contexts and it is made of repeated stacks of convolution and maxpooling layers alike CNN. In the encoder, the dimensions of the image gradually decrease. The encoder has a pair of 3 \(\times\) 3 unpadded convolutional layers, each of them followed by a rectifier linear unit (ReLU) and a maxpooling layer of 2 \(\times\) 2 sized kernel. Maxpooling layers are used for down sampling the image. The first set of convolutional layers has 64 filters and the filter number is doubled in each next set of convolutional layers (e.g., second set of convolution layers with filters 64 \(\times\) 2 = 128). The decoder of the symmetric expanding path uses transposed convolutions (deconvolution layer) that enable precise localization of the ROI in images. The size of images increases gradually while the depth of the image decreases because of the deconvolutional layers. In the decoder, upsampling of feature maps commences. This task is carried out by deconvolution layers of 2 \(\times\) 2 which halve the number of feature channels. Subsequently, concatenation with the correspondingly cropped feature map of the encoder is done, followed by two 3 \(\times\) 3 deconvolution layers and ReLU Basic working formula of ReLU is [23]:

$${\text{ReLU}}\left( q \right) = \left\{ \begin{gathered} 0,\,\,\,\,\,if \,q \le 0 \hfill \\ q,\,\,\,\,\,otherwise. \hfill \\ \end{gathered} \right.$$
(3)

The output layer of this architecture consists of a 1 \(\times\) 1 deconvolutional layer. This maps all the feature vectors into two classes (ROI and background of image) outputting an image consisting of segmented ROI (tumor) in one color and the rest of the image in another color. A sigmoid activation function Eq. (4) is used to improve accuracy. The provided pre segmented ROI images are used as validation data [23].

$$\mathrm{Sigmoid }\,\left(q\right)=\frac{1}{1+\mathrm{exp}\left(-q\right)}.$$
(4)

As this architecture does not contain any dense layers, it is a fully convolutional network (FCN) capable of taking any size of image as input. Moreover, the model uses skip connectors to reduce the loss of information. In developing a deep learning architecture, skip connections are introduced to address different issues. Particularly, skip connections allow feature reusability, while alleviating model training and convergence. In this study, skip connection is employed to improve the overall performance and competence of the model. As skip connection has a continual gradient flow from the first layer to the last, it addresses the vanishing gradient problem of the proposed U-net model effectively.

Training Strategy

The model proposed in this paper is separately trained and validated four times with a total of 473 images, containing only a single grayscale channel, for four MRI sequences. The corresponding manually segmented ROIs are used as true labels. While training, the optimizer Adam is used with a learning rate of 0.001 [24]. Equation (5) is the mathematical formula of the Adam optimizer [25].

$${\omega }_{t}={\omega }_{t-1}-\eta \frac{{\widehat{m}}_{t}}{\sqrt{{\widehat{v}}_{t}}+\varepsilon },$$
(5)

where \(\omega\) is the weight of the model, \(\eta\) is the size of each step, and \(\beta , \varepsilon\) are the required hyper-parameters. Here \({m}_{t}={\beta }_{1}{m}_{t-1}+(1-{\beta }_{1}){g}_{t}\) and squared gradient \({v}_{t}={\beta }_{2}{v}_{t-1}+\left(1-{\beta }_{21}\right){{g}_{t}}^{2}\), estimation bias corrected of first moments \({\widehat{m}}_{t}\), and similarly, second moments \({\widehat{v}}_{t=}{v}_{t}/\left(1-{{\beta }_{t1}}^{2}\right).\) The batch size is set to 32 and binary_crossentropy is employed as loss function. Binary_crossentropy [22] is defined as:

$${L}_{c}=-\frac{1}{n}\sum_{i=1}^{n} \left\{{y}_{i} \mathrm{log }\left({f}_{i}\right)+\left(1-{y}_{i}\right)\mathrm{ log}\left(1-{f}_{i}\right)\right\},$$
(6)

where \(n\) is the number of samples, \({y}_{i}\) is the true label of a specific sample and \({f}_{i}\) is its predicted label. The model is trained for 50 epochs [26]. All the training parameters for training the network are described in Table 3.

Table 3 Training hyper-parameters

Results and Discussion

DSC score, sensitivity [27] and specificity [28] are used to evaluate the model besides accuracy and validation accuracy [29] based on the provided segmented ground truth of the tumor part in the MRI.

Evaluation Metrics

DSC performance metric computes the similarity percentage between the ground truth and the output of a model. Suppose, P and Q are two sets, the dice similarity of these two sets are then calculated with Eq. (7) [23].

$$DSC=\frac{2\times |P\cap Q|}{|P|+|Q|},$$
(7)
$$\mathrm{Sensitivity }\,(P,T)=\frac{\left|{P}_{1}\wedge {T}_{1}\right|}{\left|{T}_{1}\right|}.$$
(8)

Sensitivity is calculated with Eq. (8) where, cardinalities of sets P and Q are denoted with |P| and |Q|, respectively, \({T}_{1}\) representing the proportion of tumor regions of ground truth images and \({P}_{1}\) represents tumor regions that were predicted by the model [5].

$$\mathrm{Specificity }\,(P,T)=\frac{\left|{P}_{0}\wedge {T}_{0}\right|}{\left|{T}_{0}\right|}.$$
(9)

Specificity is calculated with Eq. (9) where \({T}_{0}\) represents non-tumor tissue regions of the ground truth and \({P}_{0}\) represents the non-tumor tissue regions predicted by the model [5].

Performance Computation of all Trained Models

In Table 4, validation loss is referred as ‘V_Loss’, validation accuracy as ‘V_Acc’, specificity as ‘Spe’, sensitivity as ‘Sen’ and lastly dice similarity coefficient score as ‘DSC’.

Table 4 Validation accuracy (V_Acc), validation loss (V_Loss), specificity (Spe), sensitivity (Sen) and dice similarity coefficient (DSC) scores of the approach

Among all the model configurations trained with different MRI sequences, U-Net trained with the T1 MRI sequence results in the best validation accuracy of 99.41% with a validation loss of 0.037. The T2 MRI sequence results in the lowest validation score of 98.25% with a validation loss of 0.08. The validation accuracies of the FLAIR and T1ce sequences are 98.95% and 98.68%, respectively. In terms of specificity and sensitivity, U-Net trained on T1 MRI sequence has the best performance with scores of 99.68% and 98.97% for specificity and sensitivity, respectively. The T2 MRI sequence has the lowest scores in this regard with a specificity of 98.37% and a sensitivity of 98.49%. The most important measure regarding the performance of a segmentation model is the DSC for this metric the best scores (93.86%) are also achieved with the T1 MRI sequence trained on the U-Net segmentation model. The FLAIR MRI sequence results in a DSC of 91.23% which is slightly less than the DSC for the T1 MRI sequence. The lowest DSC of 79.32% is recorded with the T2 MRI sequence.

Moreover, further examination is conducted to assess the impact of model hyper-parameters on performance [30]. As for T1 modality, the highest performance is achieved, and the experiments concerning hyper-parameters are conducted using the dataset of T1. In this regard, different batch sizes, number of epochs, loss functions, optimizers, learning rate and dropout factors are investigated. Table 5 shows the results regarding changing these hyper-parameters.

Table 5 Performance of the model across different hyper-parameter

It is observed from Table 5 that across different hyper-parameters, the model is able to provide optimal outcomes. For batch size 32 the highest DSC of 93.86% is obtained whereas near highest performance is achieved for batch size of 16 and 64. However, while using batch size 128, the performance drops slightly. Across all three configurations regarding epoch number (50, 100 and 150), the highest DSC of 93.86% is acquired. However, to minimize the training time, epoch number 50 is kept. For both loss functions binary cross-entropy and categorical cross-entropy, the highest DSC of 93.86% is acquired is recorded. Regarding optimizer, for Adam, Nadam and Adamax, DSC > 93% is achieved where for Adam the highest DSC is found. The SGD optimizer records a slightly lower DSC of 92.43%. Across all the learning rates except 0.01 DSC > 93% is achieved. A learning rate of 0.001 is selected since for this configuration the highest DSC is acquired and lower training time is required comparatively. Three dropout factors of 0.2, 0.5 and 0.8 are experimented with where for 0.2 and 0.5 the highest DSC of 93.86% is obtained. However, for dropout factor of 0.8 performance falls to 91.38%. From the results of these experiments, it can be concluded that the model is stable and robust enough to yield optimal performance across different hyper-parameters. No sign of overfitting is perceived while training the model altering hyper-parameters. Furthermore, over different dropout factors, the model can maintain its performance consistency with no occurrence of overfitting.

Comparison with Some Existing Literatures

The proposed segmentation approach of brain MRI is compared with some recent studies conducted on similar datasets on the basis of DC scores on Table 6.

Table 6 Comparison with existing literature

With our proposed approach, it was made possible to achieve a dice score of 93%. Wei et al. [5] experimented with BraTS 2018 dataset and proposed a 3D segmentation model (S3D-UNet) which is a U-Net-based segmentation model. Using this model, they achieved a DSC of 83%. A similar approach was taken by Yanwu et al. [9] who proposed a 3D segmentation model (3D U-Net) based on an existing U-Net segmentation model, achieving a DSC of 89% with the BraTS dataset. A similar approach to our study was taken by [31] where a U-Net segmentation model was utilized for segmenting MRI images from the BraTS 2018 dataset, achieving a DSC of 82% (Table 6). A better result was obtained by [23] who proposed a BU-Net segmentation model using the BraTS 2017 dataset for training and validation. The BraTS 2018 dataset was also used to test their proposed model resulting in a 90% DSC score. Ghosh et al. [20] experimented with two models named baseline U-Net and U-Net with ResNeXt50 for brain tumor segmentation. LGG segmentation dataset (TCGA-LGG) was used in this study. Results suggest that U-Net with ResNeXt50 outperformed baseline U-Net with DSC of 93.2%. The approach proposed in this paper outperforms all studies shown in Table 6 with a DSC of 93.9%. Moreover, as stated 3D data processing and developing a deep learning model accordingly requires high computational power, most of these prior studies employed efficient hardware configuration including processor, memory and GPU.

Conclusion

In this paper, a CNN-based 2D U-net segmentation model is proposed for segmenting brain tumor from MRI images. This method receives each of the MRI sequences individually as training data and extracts ROIs (tumor regions in Brain MRI). To decrease computational cost and increase efficiency, the input images are normalized and rescaled into single 128 x 128 images. We incorporate 2D layers, to better integrate the information. This model is trained with the brain MRI dataset BraTS 2020 for each of the four MRI sequences (FLAIR, T1, T1ce, T2) to find which sequence gives the best segmentation performance based on the ground truth of the MRI images. The highest DSC score (93.9%) is recorded with the T1 MRI sequence training on U-Net model. Results are compared with previous research demonstrating that this is a promising approach while decreasing computational complexity.

Limitation and Future Work

The paper presents a novel approach for the automated segmentation of brain tumor from 3D MRI scans using optimized U-net model. In future study, the research can be extended by increasing the number of images. In this regard, some other 3D datasets of brain MRI can be explored. The deep learning architecture can be further optimized with hybrid CNN or attention mechanism-based approach. Moreover, we aim to present a classification of different types of brain tumors after segmenting the tumor region from 3D MRI scans.