1 Introduction

Brain tumour is a serious disease in which an abnormal growth of tissue inside the the brain can disrupt proper brain function. The National Brain Tumor Foundation (NBTF) reported that the number of people in developed countries who have died due to brain tumours has increased by 300% over the last three decades [1, 2]. Manual brain tumour diagnosis by physicians is a less accurate yet still time-consuming procedure [3, 4].

Magnetic resonance imaging is an advanced medical imaging technique providing rich information about the human soft tissue anatomy [5]. Automatic brain tumour detection from magnetic resonance images (MRIs) aims to classify MRIs into normal and abnormal according to the absence or presence of a tumour, respectively. Thus, the diagnosis problem can be addressed as an image classification challenge. The pervasiveness of brain tumours has led to the production of a massive amount of MRI data. Therefore, developing an automatic brain tumour diagnosis system that achieves high tumour detection and localization accuracies is a vital need [6]. Machine learning in general and deep learning approaches in particular play core roles in computer-assisted brain image analysis, segmentation, registration, and tumour tissue classification [7, 8].

Recently, deep learning paradigms have received considerable research attention. Deep learning models have multiple layers that can be trained using either supervised or unsupervised training approaches [9, 10]. In recent studies, deep learning models have achieved competitive results compared to the classical learning models in brain tumour diagnosis and tumour image classification [11, 12]. In addition, deep learning models like convolutional neural networks (CNNs) have improved the field of object detection and classification in different applications [13, 14]. As a deep learning paradigm, CNNs [15] have been used to extract high-level features from raw image data [16].

Although several studies have targeted the application of deep learning methods to brain tumour diagnosis, a complete system for automatic tumour detection and localization is not yet available in the literature. In addition, integrating tumour detection and localization accuracy within a single diagnosis system is still an open challenge. Having brain tumour detection phase preceding the tumour localization in a single system leads to discarding the normal images from being processed in the localization phase. This opens doors to the real-time deployment of such automatic tumour diagnosis systems that save time and computing power from locating tumours in normal images.

This research addresses the aforementioned challenges by developing a novel and complete computer-aided diagnosis (CAD) system for tumour detection and localization from MRIs. In the tumour detection phase, the system combines a CNN, which is used for feature extraction due to its ability to learn features from raw data, with an error-correcting output codes support vector machine (ECOC-SVM), which is used for feature classification. The system is considered a two-phase multi-model artefact due to its detection and localization abilities using different CNN models.

Three CNN models, namely, AlexNet, Visual Geometry Group (VGG)-16, and VGG-19, have been deployed and evaluated for tumour detection. A five-layer fully developed region-based convolutional neural network (R-CNN) has been used in the second phase for tumour localization. The accuracy of the developed two-phase multi-model system has been evaluated using MRIs extracted from two different databases: the Reference Image Database to Evaluate Response (RIDER) Neuro MRI database [17] and the Brain Tumor Segmentation (BraTS) 2013 database.

The key contributions of this study exist in several dimensions. First, a complete two-phase multi-model system for brain tumour detection and localization that combines a CNN and an ECOC-SVM is implemented and evaluated. Although CNNs and SVMs are not new approaches, the state-of-the-art in this field does not yet include systems similar to the one developed here, i.e., with these specific phases and CNN learning models. Second, integrating tumour detection and localization in one artefact is a good step towards real-time deployment of the research findings. Third, 349 MRIs from the RIDER Neuro MRI dataset are extracted and manually annotated, and this dataset will be available as a standard benchmark for other researchers who conduct research on tumour detection. Fourth, the conducted experimental works have proven the superiority of the developed system in terms of tumour detection and tumour localization.

The remainder of this article is organized as follows. Section 2 is devoted to reporting related work on both the detection and localization of brain tumours. The description of the two-phase multi-model tumour diagnosis system with respect to design and implementation is provided in Section 3. The experimental studies and the simulations for evaluating the whole diagnosis system in terms of tumour detection and localization accuracy are documented in Section 4. Finally, concluding remarks are outlined in Section 5.

2 Related work

Brain tumour detection has received substantial research attention, and several detection methods have been presented over the past two decades. In [18], a combination of morphological filters, discrete wavelet transforms (DWTs), principal component analysis (PCA), and kernel support vector machines (KSVM) were applied to classify MRIs as normal and abnormal. Further improvements have been realized to classify an abnormal image as benign (noncancerous) or malignant (cancerous) by applying a double classifier Abd-Ellah et al. [19]. Devasena and Hemalatha [20] proposed a CAD system for the detection of abnormal parts in MRIs using a hybrid abnormality detection algorithm (HADA).

In [21], Patil and Udupi used preprocessing, segmentation, feature extraction, and a probabilistic neural network (PNN) method to identify brain tumours. Arakeri and Reddy [22] applied three different classifiers: an SVM, an artificial neural network (ANN), and a k-nearest neighbour (k-NN) algorithm to distinguish between different types of tumours. Dandıl et al. [23] proposed a system for brain tumour classification where spatial-fuzzy C-means (FCM) was applied for brain tumour segmentation, and an SVM was used in the classification stage.

Goswami and Bhaiya [24] presented a new MRI brain tumour classification based on an ANN. Histogram equalization, noise filtering, and edge detection were applied in the preprocessing stage. The features were extracted by independent component analysis (ICA). A self-organizing map neural network (SOMNN) was applied in the classification stage. Deepa and Devi [25] proposed a system consisting of feature extraction, classification, and tumour segmentation. Optimal texture features were extracted from tested images using statistical features. A back propagation neural network (BPN) and a radial basis function neural network (RBFN) were used in the classification and segmentation stages, respectively. A short comparison of the aforementioned classical learning-based diagnosis methods is provided in Table 1.

Table 1 Comparison of the available traditional classification approaches used in brain tumour detection

Few deep learning-based methods for brain tumour diagnosis were found in the state-of-the-art in this field. In [26], brain CT image classification based on a deep neural network (DNN) was presented. A gray-level co-occurrence matrix (GLCM) was used for feature extraction, and a DNN was applied for classification. The average accuracy of the reported method was 83%. Gao et al. [27] introduced a CT brain image classification that integrated both 2D and 3D CNNs. The average classification accuracy rate was 87.6%, while each network consisted of seven layers. Yan Xu et al. proposed a method using deep convolutional activation features for brain tumour classification and segmentation. The system classification accuracy was 97.5% [28].

Several deep learning methods for brain tumour localization have been presented in the literature. Havaei et al. [29] demonstrated automatic brain tumour localization using a DNN based on the CNNs with an average DICE score of 0.88.

In [30], a fully convolutional residual neural network (FCR-NN) combined with a fully convolutional architecture with optimization gains from residual identity was developed. The network consists of a 22 complex layers, and the achieved DICE score was 0.87. Casamitjana et al. [31] proposed the use of 3D CNNs for the segmentation of abnormal parts in MRIs based on a network using multi-resolution features and reported a DICE score of 0.84.

Zhao et al. [32] used a fully convolutional neural network (FCNN) and conditional random fields (CRFs) as a post-processing step to segment brain tumours with an obtained DICE score of 0.87. Pereira et al. in [33] presented a preliminary study on brain tumour segmentation based on fully convolutional network (FCN) with standardization of MRI histograms as a preprocessing stage, which demonstrated a DICE score of 0.85. More recently, Pereira et al. [34] examined a complete segmentation system that includes preprocessing, CNN, and post-processing stages. The new system could achieve an improved DICE score of 0.88 at the expense of increased system complexity.

3 Methods

Currently, deep learning models are of considerable research importance. Deep learning methods provide high efficiency and the ability to process numerous MRIs from databases [35]. This study focuses on CNNs, which have gained popularity among researchers for object recognition and biological image segmentation.

The developed diagnosis system has two phases for tumour detection and localization. The first phase converts tumour detection into an MRI classification problem and then categorizes MRIs into normal and abnormal images. The second phase focuses on tumour localization inside the abnormal images. Figure 1 shows a block diagram of the proposed system. MRIs are delivered to the system’s input, and then, the preprocessing tasks are performed. A feature extraction tool is used to select the relevant features, and a classification tool is applied to decide whether these features belong to a normal or an abnormal MRI. To achieve accurate results, the five-layer R-CNN has been trained and evaluated using MRIs from BraTS 2013 as a standard benchmark for tumour localization.

Fig. 1
figure 1

A flowchart of the developed CNN-based system for brain tumour detection and localization. The flowchart shows the two system phases (detection and localization phases) and highlights the two different types of convolutional neural network (CNN) used for each system phase. The AlexNet, VGG-16, and VGG-19 models fit the CNN in the detection phase, and a five-layer fully developed R-CNN was used in the localization phase

3.1 Brain tumour detection phase

3.1.1 Input image and preprocessing

Due to the ability of magnetic resonance imaging to provide information about the soft tissues in the brain, databases of MRIs were used in this study as an input to the developed system. The goal of the preprocessing stage is to ensure the MRI readiness for the next stage. It improves image quality, reduces the noise level, and enhances the overall image resolution by employing classical filtering techniques [36]. The preprocessing stage also adjusts the image size and type to a 227×227 RGB image, which is provided to the trainable CNN in the feature extraction stage.

3.1.2 Convolutional neural network (CNN)

CNN is the deep learning model that has improved over the past two decades. CNNs can learn features automatically from input data, especially images, as is the case in this research [37]. The convolutional layers are used to convolve the input image with kernels (weights) to obtain a feature map. The weights of the kernels connect the feature map units to the previous layer. AlexNet and two other CNN models, namely, VGG-16 and VGG-19, were used and evaluated in the tumour detection phase. However, the focus of the theoretical foundation section is on the AlexNet model. The AlexNet model was selected due to its flexibility to be modified, its ability to reduce over-fitting using a dropout layer, and its capability to train faster through using a rectified linear unit (ReLU).

The AlexNet model utilized for tumour detection is a pre-trained network from the MatConvNet toolbox [12]. It consists of 23 layers with weights. The first layer defines the input dimensions, including an input image size of 227×227×3. A series of convolutional layers intervene with ReLU and max-pooling layers. The final layer is a classification layer with 1000 classes.

The tumour detection phase employs the CNN network for feature extraction, and there are few layers in the CNN that are suitable for this goal. The CNN depends on nine layers, as shown in Fig. 2: an image input, two convolution, two ReLU, two cross channel normalization (Norm), and two max-pooling layers. Layers 2 and 6 are convolutional layers with 11×11×3 and 5×5×48 convolutions, respectively. If we suppose that the input x and the weight w are realized on integer t, the 2D discrete convolution can be defined as in Eq. 1:

$$ y(i,j)=x(i,j)*w(i,j) = \sum\limits_{m}\sum\limits_{n}x(m,n)w(i-m,j-n) $$
(1)
Fig. 2
figure 2

The architecture of the CNN used in this study. The name of each network layer is provided. The figure shows a snapshot of the whole network architecture

where mn is the size of the convolution matrix, w and x are the inputs from the previous layer, and y is the output of the convolutional layer.

To facilitate the convolution process, the dimensions of the convolutional kernel and the CNN input should be matched. If the CNN is a multidimensional array, then the kernel should be a multidimensional array as well. If the CNN input is a two-dimensional image [38], a two-dimensional kernel w is used, as presented in Eq. 1.

The output of the convolutional layers, i.e., layers 2 and 6, will apply a sigmoidal activation function to the obtained convolutional output y in Eq. 1 to provide the new output f modelled as function of the input y with Eqs. 2 or 3:

$$ f(y) = \tanh (y) $$
(2)
$$ f(y) = \frac{1}{1+e^{-y}} $$
(3)

The output of the ReLU layers, i.e., layers 3 and 7, will apply a linear activation function to the neuron output y as shown in Eq. 4:

$$ f(y) = max(0,y) $$
(4)

Layers 4 and 8 are the cross-channel normalization layers with five channels. The 3×3 max-pooling layers are layers 5 and 9. These layers are very simple because they take a K×K region and output a single value, and then the output is an \(\frac {N}{K}\times \frac {N}{K}\) layer. Layer 9 provides the extracted features in a dimension of 1×43264, which are then fed to the SVM to carry out the image classification.

3.1.3 Error-correcting output codes support vector machines (ECOC-SVMs)

Dietterich and Bakiri presented an error-correcting output codes (ECOC) approach to correct the data error when input enters into a channel. The approach uses error-correcting codes to convert the binary classifiers to a multi-class classification [39]. It has been successfully used in many applications, such as face recognition [40], face verification [41], text recognition [42], and digital manuscript classification [43].

Error-correcting output codes support vector machines (ECOC-SVMs) is a method that combines the SVMs and ECOC. It was proposed to solve the online identification and feature extraction problems. When applying ECOC-SVMs to multi-classification, the k classes classification is transformed to the L classes classification by the ECOC matrix. ECOC-SVM functions according to the following steps [44]:

  1. 1

    Training: The classes are represented in two subsets (0 or 1) for each row of the coding matrix using the SVM binary partition. The SVM takes the row with a value of 0 as the first class and the row with a value of 1 as the second class. The corresponding class assigns a codeword for each column of the matrix. All of the L classes are trained according to rows of the ECOC matrix, where L is the dimensionality of the new feature space and the length of the codeword.

  2. 2

    Testing: The test data is fed into the learned classifier. The L classifiers recognize the samples and generate output vector Z. The hamming distance between each vector in the output vector matrix and code matrix is shown in Eq. 5. The classifier chooses the class depending on the minimal distance.

$$ c_{i} = argmind (Z, H_{i}) = \sum\limits_{j=1}^{L} |Z_{j}- H_{i,j}|, ~i=1,2,...,k $$
(5)

where H is the code matrix with i rows and j columns.

3.2 Brain tumour localization phase

Since the BraTS 2013 database includes 3D MRIs, we used its MRIs slice by slice as inputs for the CNN. Our method handles each slice as a 2D image [29], and T2 images were selected only with segmented images. In the preprocessing stage, we detect the tumour region from the segmented image using a bounding box method. The T2 image with the tumour bounding box are provided to the R-CNN block to train the network. As a post-processing step, we measure the properties of regions in the tested image and select the region most intersected with the R-CNN as the tumour region.

3.2.1 5-layer region-based convolutional neural network (R-CNN)

R-CNNs offer high object detection accuracy. The R-CNN method trains a CNN to classify the image regions into an object or a background category. It uses the log loss approach to fine-tune the image regions into objects. Then, an SVM approach is applied to detect objects [45].

The R-CNN architecture used in the tumour localization phase consists of five layers. The network first processes the whole image using the image-input layer (layer 1) and then converts the input data with a size of 28×28×1 into a convolutional layer (layer 2) with a filter size of 5 and filter number of 20 for detecting multiple class-specific objects. A fully connected layer (layer 3) with two outputs is used to predict a single-box for tumour localization. The softmax layer (layer 4) is also known as the normalized exponential. The output unit activation function is the softmax function [4648], as presented in Eq. 6:

$$ y_{r}=\frac{\exp (a_{r})}{ {\sum\nolimits}_{j = 1}^{k}\exp (a_{j})} $$
(6)

where a is the input vector with k dimensions and y is the output vector with k dimensions. The denominator \({\sum \nolimits }_{j = 1}^{k}\exp (a_{j})\) normalizes the output to ensure that \({\sum \nolimits }_{r = 1}^{k}y_{r} = 1\) and 0≤yr≤1, which can be represented graphically as a layer with k neurons.

The classification layer used as the final layer is named after the loss used for the training process. It returns a neural network classification output. For multi-class classification, the loss (error) function is the cross entropy function shown in Eq. 7:

$$ E(\theta)= -\sum\limits_{i = 1}^{n}\sum\limits_{j = 1}^{k}t_{ij}\ln y_{j}(x_{i},\theta) $$
(7)

where θ is the parameter vector, tij is the indicator that the i sample belongs to the j class, and yj(xi,θ) is the output for sample i.

4 Results and discussion

4.1 Experimental environment setup

The image database used for evaluating the tumour detection phase was extracted from the RIDER Neuro MRI database [17]. The input database consists of brain MRIs from 19 patients. These images were obtained approximately 2 days apart. T1-weighted and T2-weighted MRIs were used. The database contains a total of 349 MRIs, including 109 normal images and 240 abnormal images. A set of 43264 features was extracted from each MRI image which later was used to train the SVM. The SVM was trained using 122 ×43264 features in total. The specifications of the utilized database are presented in Table 2. A sample of the MRIs database used in the experimental work is shown in Fig. 3.

Fig. 3
figure 3

Representative images from the MRIs database that was used in the tumour detection phase. Normal MRIs without brain tumours are in the top row, while abnormal images with brain tumours present are in the bottom row

Table 2 The configurations of the MRIs database in the training and testing datasets used in tumour detection phase

To evaluate the tumour localization phase, a dataset of 804 3D MRIs extracted from the BraTS 2013 database was used for the localization accuracy assessment. The database consists of 20 high-grade (HG) and 10 low-grade (LG) patients and T1, T1c, T2, and FLAIR type MRI modalities. A set of 593 MRIs was used for R-CNN training and validation, and a set of 211 MRIs was used for the testing dataset. The 3D images were converted into 2D slices, and the T2 MRI modality was used.

4.2 Accuracy evaluation criteria

To evaluate the two system phases, the following standard metrics were used: sensitivity, which represents the proportion of actual positives that are correctly classified; specificity, which indicates the proportion of negatives that are correctly classified; and accuracy, which is the proportion of both true positives and true negatives. These three metrics were calculated using Eqs. 8, 9, and 10, respectively [18, 19]:

$$ \text{Sensitivity} = \frac{\sigma}{\sigma + \Phi} $$
(8)
$$ \text{Specificity} = \frac{\eta}{\eta + \Psi} $$
(9)
$$ \text{Accuracy}= \frac{\sigma + \eta}{\sigma + \eta + \Phi + \Psi} $$
(10)

where true positives (σ) are the correctly classified positive cases, true negatives (η) are the correctly classified negative cases, false positives (Ψ) are the incorrectly classified positive cases, and false negatives (Φ) are the incorrectly classified negative cases.

In addition to sensitivity, accuracy, and specificity, other parameters have been considered for evaluating tumour detection and localization phases. The new parameters were balanced accuracy (BA) = (sensitivity + specificity)/2, positive predictive value (PPV) = σ/(σ + Ψ), and negative predictive value (NPV) = η/(η + Φ) [11]. Additionally, DICE scores are considered a common parameter for tumour localization approaches.

4.3 Tumour detection accuracy analysis

The performance of the CNN is measured in every network layer. The goal of this step is to identify the optimum number of network layers for achieving the highest accuracy and the shortest feature extraction time. The measured performance of each layer is recorded in Table 3. It is demonstrated in Table 3 that setting the number of layers equal to 9 provided the best performance.

Table 3 The obtained number of features, the consumed feature extraction time, and the accuracy of the different CNN layers

Table 4 shows the best accuracy obtained from the three CNN models used in this study and the corresponding layer number. The network size has been optimized to give the best performance in learning phase and then has been fixed, with the validation and testing phases using the same network size as the learning phase. This is to generalize the CNN model for all data samples.

Table 4 Comparison of the performance of the three CNN models used in the tumour detection phase

To further investigate the performance of classical learning models against the proposed deep CNN-based model, a classical brain tumour diagnosis method has been applied and studied using the same database configurations as in Table 2. The proposed CNN system achieved 99.55% accuracy, while the classical method accomplished 66.96%. A short comparison between the performance of the previous classical method and the newly proposed deep learning system using the same database is presented in Table 5. The obtained results in Table 5 show the superiority of the deep learning method in terms of brain tumour detection accuracy, which is reflected in the MRI classification.

Table 5 Comparison of the obtained tumour detection accuracy from the proposed CNN-based system to previous work in which a classical approach was used with the same database as a unified benchmark

The proposed CNN-based system has achieved 99.55% accuracy for tumour diagnosis when selecting nine CNN layers. To ensure the credibility of the obtained results, a comparison with results from the literature was performed. The outcome of the proposed CNN-based system compared with those of three popular methods reported in the literature is reported in Table 6.

Table 6 Comparison of the proposed CNN-based brain tumour detection approach against some former state-of-the-art methods

4.4 Tumour localization accuracy analysis

Simulation experiments have been conducted to show the performance of the proposed system in fulfiling the tumour localization task. A comparison with other studies found in the literature is shown in Table 7. It is clear that the proposed system is superior to the others in terms of the network size, sensitivity and DICE metrics but not the PPV metric. A sample of visual results from the MRIs database used in the localization phase of the experimental work is shown in Fig. 4.

Fig. 4
figure 4

Visual results in the axial view from the localization phase. The top row shows the T2 modality. The second row shows the ground truth. The bottom row shows the segmented images

Table 7 Performance analysis of the developed localization R-CNN-based method compared to other methods in the literature

5 Conclusions

This paper has proposed a two-phase multi-model deep learning-based system for brain tumour detection and localization from MRIs. The main goals of this study are to classify MRIs into normal and abnormal images according to the absence or presence of a brain tumour and to accurately localize the tumour within the abnormal MRIs. The first system phase employed CNN and ECOC-SVM approaches for feature extraction and classification, respectively. A five-layer R-CNN was used for tumour localization in the second system phase. The tumour detection phase was evaluated using 349 MRIs extracted from the RIDER Neuro MRI database. The empirical work has shown that the method achieved an accuracy of 99.55%. The tumour localization phase was evaluated using the BraTS 2013 database, and a DICE score of 0.87 has been achieved using 804 3D MRIs. The accomplished results proved the superiority of the proposed deep learning-based method for tumour detection and demonstrate the superiority of the whole system’s performance concerning both tumour detection and localization measures.