Automated brain tumor segmentation on multi-modal MR image using SegNet

The potential of improving disease detection and treatment planning comes with accurate and fully automatic algorithms for brain tumor segmentation. Glioma, a type of brain tumor, can appear at different locations with different shapes and sizes. Manual segmentation of brain tumor regions is not only time-consuming but also prone to human error, and its performance depends on pathologists’ experience. In this paper, we tackle this problem by applying a fully convolutional neural network SegNet to 3D data sets for four MRI modalities (Flair, T1, T1ce, and T2) for automated segmentation of brain tumor and subtumor parts, including necrosis, edema, and enhancing tumor. To further improve tumor segmentation, the four separately trained SegNet models are integrated by post-processing to produce four maximum feature maps by fusing the machine-learned feature maps from the fully convolutional layers of each trained model. The maximum feature maps and the pixel intensity values of the original MRI modalities are combined to encode interesting information into a feature representation. Taking the combined feature as input, a decision tree (DT) is used to classify the MRI voxels into different tumor parts and healthy brain tissue. Evaluating the proposed algorithm on the dataset provided by the Brain Tumor Segmentation 2017 (BraTS 2017) challenge, we achieved F-measure scores of 0.85, 0.81, and 0.79 for whole tumor, tumor core, and enhancing tumor, respectively. Experimental results demonstrate that using SegNet models with 3D MRI datasets and integrating the four maximum feature maps with pixel intensity values of the original MRI modalities has potential to perform well on brain tumor segmentation.

Thi s v e r sio n is b ei n g m a d e a v ail a bl e in a c c o r d a n c e wit h p u blis h e r p olici e s. S e e h t t p://o r c a . cf. a c. u k/ p olici e s. h t ml fo r u s a g e p olici e s. Co py ri g h t a n d m o r al ri g h t s fo r p u blic a tio n s m a d e a v ail a bl e in ORCA a r e r e t ai n e d by t h e c o py ri g h t h ol d e r s .

Introduction
Glioma is one of the most common types of primary tumour that occur in the brain. They grow from glioma cells and can be categorized into low and high grade gliomas. High grade gliomas (HGG) are more aggressive and highly malignant in a patient, with a life expectancy of at most two years, while low grade gliomas (LGG) can be benign or malignant, and grow more slowly in a patient, with a life expectancy of several years [1]. Accurate segmentation of brain tumor and surrounding tissues such as edema, enhancing tumor, non-enhancing tumor, and necrotic regions is an important factor in assessment of disease progression, therapy response, and treatment planning in patients [2]. Multi-modal magnetic resonance imaging (MRI) is widely employed in clinical routine for diagnosis and monitoring tumor progression. MRI has been one of the popular imaging techniques as it facilitates tumour analysis by visualizing its spread; it also gives soft tissue contrast compared to other techniques like computed tomography (CT) and positron emission tomography (PET). Moreover, multi-modal MRI protocols are normally used to evaluate brain tumor tissues as they have the capability to separate different tissues using a specific sequence based on tissue properties. For example, T1-weighted images are good at separating healthy tissues in the brain while T1ce (contrast enhanced) helps to separate tumor boundaries which appear brighter because of the contrast agent. Edema around tumors is detected well in T2-weighted images, while FLAIR images are best for differentiating edema regions from cerebrospinal fluid (CSF) [3,4]. Gliomas have complex structure and appearance. They require accurate delineation in images. Tumor components are often diffuse, with weak contrast. Their borders are often fuzzy and hard to distinguish from healthy tissue (white matter, gray matter, and CSF), making them hard to segment [5]. All these factors lead to time-consuming manual delineation, which is expensive and prone to operator bias. Automatic brain tumor segmentation using MRI would solve these issues by providing an efficient tool for reliable diagnosis and prognosis of brain tumors. Therefore, many researchers have considered automated brain tumor segmentation from MRI images.
Recently, convolutional neural networks (CNNs) have attracted attention in object detection, segmentation, and image classification. For the BraTS challenge, most CNN-based methods are patch-wise models [5][6][7]. These methods take only a small region as input to the network, which disregards the image content and label correlations. Additionally, these methods take a long time for training. CNN architecture is modified in several ways in fully convolutional networks (FCN). Specifically, instead of making probability distribution prediction patch-wise in CNN, FCN models predict one probability distribution pixel-wise [8]. In the method of Ref. [9], different MRI modalities are stacked together as different input channels into deep learning models. However, the correlation between different MRI modalities was not explicitly considered. To overcome this problem, we develop a feature fusion method to select the most effective information from different modalities. A model is proposed to deal with multiple MRI modalities separately and then incorporate spatial and sequential features from them for 3D brain tumour segmentation.
In this study, we first trained four SegNet models with 3D data sets with Flair, T1, T1ce, and T2 modalities as input data. The outputs of each SegNet model are four feature maps, which represent the scores of each pixel being classified as background, edema, enhancing tumor, and necrosis. The highest scores in the same class from the four SegNet models are extracted and four feature maps with the highest scores are obtained. These feature maps are combined with the pixel values of the original MRI models, and are taken as the input to a DT classifier to further classify each pixel. Our results demonstrate that this proposed strategy can perform fully automatic segmentation of tumor and sub-tumor regions.
The main contributions of this paper are as follows: • A brain tumour segmentation method that uses 3D data information from the neighbors of the slice in question to increase segmentation accuracy for single mode MR images.
• Effective combination of features extracted from multi-modal MR images, maximizing the useful information from different modalities of MR images.
• A decision tree-based segmentation method which incorporates features and pixel intensities from multi-modal MRI images, giving higher segmentation accuracy than single-modal MR images.
• Evaluation on the BraTS 2017 dataset showing that the proposed method gives state-of-the-art results.

Related work
Many methods have been investigated for medical image analysis; promising results have been provided by computational intelligence and machine learning methods in medical image processing [10]. The problem of brain tumour segmentation from multimodal MRI scans is still a challenging task, although recently various advanced methods of automated segmentation have been proposed to solve this task.
Here, we will review some of the relevant works for brain tumour segmentation. For machine learning methods other than deep learning, Gooya et al. [11], Zikic et al. [12], and Tustison et al. [13] present some typical works in this field. Discriminative learning techniques such as SVM, decision forests, and conditional random fields (CRFs) have been reviewed in Ref. [2].
One common aspect of classical discriminative models is that their implementation is based on predefined features, as opposed to deep learning models that automatically learn a hierarchy of increasingly complex features directly from data, resulting in more robust features [5]. Pereira et al. [7] used two different CNNs for the segmentation of LGG and HGG. The architecture in Ref. [5] involves two pathways, a local pathway that focuses on the information in a pixel's neighborhood, and a global pathway that captures global contextual information from an MRI slice to perform accurate brain tumour segmentation. A dualstream 11-layer network with a 3D fully CRF as postprocessing was presented in Ref. [14]. An adapted version of DeepMedic with residual connection was employed for brain tumour segmentation in Ref. [15].
Patch-wise methods contain many redundant convolutional calculations, but only explore spatially limited contextual features. To avoid using patches, FCN with deconvolution layers can be used to train an end to end and pixel to pixel CNN for pixel-wise prediction with the whole image as input [8]. Chang [16] demonstrated an algorithm that contains FCN and CRF. Shelhamer et al. [8] suggested to use skip connections to join high-level features from deep decoding layers with appearance features from shallow encoding layers to recover spatial information lost during downsampling. This method has demonstrated promising results on natural images and is also applicable to biomedical images [17]. Ronneberger et al. [9]a n dÇ içek et al. [18] used U-Net architecture which consists of a down-sampling path to capture contextual features and a symmetric up-sampling path that enables accurate localization with 3D extension. However, the depth information is ignored by approaches based on 2D. Nevertheless, Lai [19] used the depth information by implementing a 3D convolution model which utilizes the correlation between slices. A large number of parameters is required by the 3D convolution network. Moreover, in a small dataset, a 3D convolution network is prone to overfitting.
In Refs. [5,20], the input data to the deep learning methods were treated as different modality channels. Therefore, the correlation between them is not well used. The correlations between different MRI modalities are utilized in our proposed method by implementing 3D MRI data sets for each MRI modality separately with a SegNet model, and combining the feature maps of the last deconvolution layers for each trained SegNet model with the pixel intensity values of the original MRI models, feeding them into a classifier.

Approach
Our brain tumor segmentation algorithm aims to locate the entire tumor volume and accurately segment the tumor into four sub-tumor parts. Our method has four main steps: a pre-processing step to construct 3D MRI datasets, a training step to finetune a pretrained SegNet for each MRI modality separately, a post-processing step to extract four maximum feature maps from the SegNet models' score maps, and a classification step to classify each pixel based on the maximum feature maps and the MRI pixel values. Figure 1 shows the pipeline of our proposed system using SegNet networks.

Data pre-processing
In our study, MRI intensity value normalization is important to compensate for MRI artifacts, such as motion and field inhomogeneity, and also to allow data from different scanners to be processed by a single algorithm. Therefore, we need to ensure that the value ranges match between patients and different modalities to avoid initial biases of the network.
Firstly, to remove unwanted artifacts, N4ITK bias field correction is applied to all MRI modalities [21]. If this correction is not performed in the pre-processing step, artifacts cause high false positives, resulting in poor performance. Figure 2 shows the effects of applying bias field correction to an MR image. Higher intensity values, which can lead to false positives in the predicted output results, are observed in the first scan near the bottom left corner. The second scan has better contrast near the edges after removing the bias.
Intensity values across MRI slices have been observed to vary greatly, so a normalization preprocessing step is also applied in addition to bias field correction so as to bring the mean intensity value and variance close to 0 and 1, respectively. Equation (1) shows how to compute the slice value I n : where I is the original intensity value of the MRI slice, and µ and σ are the mean and standard deviation of I respectively.  Additionally removing the top and bottom 1% intensity values during the normalization process brings the intensity values within a coherent range across all images for the training phase. To remove a significant portion of unnecessary zeros in the dataset and to save training time by reducing the huge memory requirements for 3D data sets, we trimmed some black parts of the image background from the data for all modalities to get input images of size 192 × 192.
As shown in Fig. 1, the main step in pre-processing is 3D database construction. Since there are four modalities in the MRI dataset for each patient, we took them as four independent inputs. When processing the jth slice, we also use the (j − 1)th and (j + 1)th slices to make advantage of 3D image information. To do so, the three adjacent slices for each modality are taken as three color channels of an image and used as 3D inputs.

Brain tumor image segmentation by SegNet networks
The semantic segmentation model in Fig. 3 takes full-size images as input for feature extraction in an end-to-end manner. The pretrained SegNet is used, and its parameters are finely tuned using images with manually annotated tumor regions. In the testing process, the final SegNet model is used to create predicted segmentation masks for tumor regions for unidentified images. The motivation for using SegNet networks instead of other deep learning networks is that SegNet has a small number of parameters and does not need high computational resources like DeconvNet [23], and it is easier to train end-to-end. Moreover, in a U-Net network [9], entire feature maps in the encoders are transferred to the corresponding up-sampling decoders and concatenated to give decoder feature maps, which leads to high memory requirements, while in SegNet only pooling indices are reused, needing less memory. In our network architecture, the main idea used from FCN is to change the fully connected layers of VGG-16 into convolutional layers. This not only helps in retaining higher resolution feature maps at the deepest encoder outputs, but also reduces the number of parameters in the SegNet encoder network significantly (from 134M to 14.7M). This enables the classification net to output a dense feature map which keeps spatial information [22].
The SegNet architecture consists of a downsampling (encoding) path and a corresponding upsampling (decoding) path, followed by a final pixelwise classification layer. In the encoder path, there are 13 convolutional layers which match the first 13 convolutional layers in the VGG16 network. Each encoder layer has a corresponding decoder layer; therefore, the decoder network also has 13 convolutional layers.
The output of the final decoder layer is fed into a multi-class soft-max classifier to produce class probabilities for each pixel independently.
The encoder path consists of five convolution blocks, each of which is followed by a max-pooling operation with a 2 × 2 window and stride 2 for downsampling. Each convolution block is constructed by several layers of 3 × 3 convolution combined with batch normalization and element-wise rectified linear nonlinearity (ReLU). There are two layers in each of the first two convolution blocks, and three layers for the next three blocks. The decoder path has a symmetric structure to the encoder path except that the max-pooling operation is replaced by an upsampling operation. Upsampling takes the outputs of the previous layer and the output of the max pooling indices of the corresponding encoding layer as input. The output of the final decoder, which is a high dimensional feature representation, is fed into a soft-max classifier layer, which classifies each pixel independently. See Fig. 3. Subsequently, the output of the soft-max classifier is a K channel image, where K represents the number of desired classes, with probability value at each pixel.

Post-processing
As described in Section 3.2, four SegNet models are adapted and trained separately for segmentation of brain tumors from multi-modal MR images. The earlier layers of the SegNet models learn simple features like circles and edges, while the deeper layers learn complex and useful finer features. The machinelearned features in the last deconvolution layer in each SegNet model represent four score maps, related to the four classification labels (background, necrosis, edema, and enhancing tumor). The four highest score maps are constructed from the obtained 16 feature maps. The values of each highest activation feature maps represent those strong features that include all hierarchical features (at higher resolution), helping to increase the classification performance. To further increase the information for classification, a feature vector is generated based on combination of the four highest score maps and the pixel intensity values of the original MRI modalities. Finally, the encoded feature vector is applied to a DT classifier to classify each MRI image voxel into tumor and sub-tumor parts. The reason for using DT as the classifier in this work is that it has been shown to provide high performance for brain tumour segmentation [2]. The selection process for highest feature maps and their location in the SegNet architecture are illustrated in Fig. 4.

SegNet Max DT
As described above, the four highest score maps are combined with pixel intensity values and considered as feature vectors. Then, the feature vectors are presented to a DT classifier. In this phase, the maximum number of splits or branch points is specified to control the depth of the designed tree.
Different tree depths of DT classifier were examined and tuned on the training datasets. Optimal generalization and accuracy were obtained from a tree with depth 15. 5-fold cross validation data were used to evaluate the classification accuracy.

Training and implementation details
The proposed algorithm was implemented using MATLAB 2018a and run on a PC with an Intel Core i7 CPU with 16 GB RAM using Windows 7. Our implementation was based on the MATLAB deep learning toolbox for semantic segmentation and its classification learner toolbox for training the DT classifier. The whole training process for each model took approximately 3 days on a single NVIDIA GPU Titan XP. We updated the loss function on the training set using stochastic gradient descent, with parameters set as follows: learning rate = 0.0001, maximum number of epochs = 80.

Experiments and results
All 285 patient subjects with HGG and LGG in the BraTS 2017 dataset were included in this study  [2,24]. 75% of the patients (158 HGG and 57 LGG) were used to train the deep learning model and 25% (52 HGG and 18 LGG) were assigned to the testing set. For each patient, there were four types of MRI sequences (Flair, T1, T1ce, and T2). All images were segmented manually in one to four rates (using 3 labels, 1: the necrotic and non-enhancing tumor, 2: the peritumoral edema, 4: GD-enhancing tumor). The segmentation ground truth for each subject was observed by experienced neuro-radiologists. Figure 5 demonstrates MRI modalities and their ground truth.
The model performance was evaluated on the test set. For practical clinical applications, the tumor structures are grouped into three different tumor regions defined by • The complete tumor region including all four intratumor classes (necrosis and non-enhancing, edema, enhancing tumor, labels 1, 2, and 4).
• The core tumor region (as above but excluding edema regions, labels 1 and 4).
• The enhancing tumor region (only label 4). For each tumor region, the segmentation results were evaluated quantitatively using the F -measure which provides an intersection measurement between the manually defined brain tumor regions and the segmentation prediction results of the fully automatic method, as follows: From our preliminary results, we observed that our 3D model can achieve brain tumor detection accurately even though we only trained each MRI modality separately instead of combining 4 MRI modalities as input as in other studies. The high accuracy comes from the fact that the network architecture is able to capture 3D fine details of tumor regions from adjacent MRI slices (j − 1,j,j +1) of the same modality. Consequently, the convolutional layers can extract more features, which is extremely helpful in improving the performance of brain tumor segmentation. Moreover, relatively accurate brain tumor segmentation was achieved by extracting the four highest feature maps combined with the pixel intensity values of the original MRI images. The score maps are obtained from the last deconvolution layer in each SegNet model because in this layer all hierarchical features that contain finer details (at higher resolution) are included, which gives accurate brain tumor detection results. Table 1 gives evaluation results for the proposed method on the BraTS 2017 Training dataset for four MRI modalities, while Table 2 compares our method with other methods.
From Table 1 it can be seen that SegNet Max DT performs better than individual SegNet models. As explained in Section 3.4, only the highest scores for each specific sub-tumour regions are selected   for classification, which is why we can get highest accuracy using SegNet Max DT. Table 2 shows that our method gives better results in core and enhanced tumor segmentation, though the complete segmentation accuracy is not better than that of Refs. [25]a n d [ 26]. This is because that our method has a relatively low detection accuracy for edema. However, we consider the core or enhanced region to be much more important than the edema region. It is worth sacrificing accuracy of edema detection to increase accuracy of core and enhanced tumour detection. Figure 6 demonstrates some visual results from semantic segmentation structures of SegNet models and the SegNet Max DT method from an axial view.

Discussion and conclusions
In this study, the publicly available BraTS 2017 dataset was used. A DT and four SegNet models were trained with the same training dataset that includes ground truth. A testing dataset without ground truth was used for system evaluation. Our experiments show that the SegNet architectures with 3D dataset and post-processing presented in this work can efficiently and automatically segment brain tumors, completing segmentation for an entire volume in four seconds on a GPU optimized workstation. However, some models like SegNet3 T1 and SegNet4 T2 do not give accurate results because T1 and T2 MRI modalities only give information related to healthy and whole tumor tissues rather than other sub-parts of a tumor like necrosis and enhancing tumor. To tackle this problem, maximum feature maps from all SegNet models were combined, so that only strong and useful features from all SegNet models are presented to the classifier. Four MRI modalities were trained separately for multiple reasons. Firstly, different modalities have different features, so it is faster to train them using different simple models rather than one complex model. Secondly, specific features can be extracted directly related to the specific modality of each SegNet model, providing clinicians with specific information. Finally, one of the most common MRI limitations is the prolonged scan time required to get different MRI modalities, so sometimes, depending on a single modality to detect a brain tumor can be a good solution to save time in clinical applications.
It is worth mentioning that in the proposed method, the training stage is time-consuming, which could be considered to be a limitation, but the prediction phase rapidly processes the testing dataset to provide semantic segmentation and classification. Although our method can segment core and enhanced tumors better than state-of-the-art methods, it is not better in segmenting complete tumors. However, further post-processing techniques could improve the accuracy of our method, and the SegNet models could be saved as trained models and refined by use of additional training datasets. Consequently, a longitudinal study using different FCN and CNN architectures should be taken over time to increase the proposed system performance. Len Nokes is a professor of clinical biomechanics, Cardiff University, UK. He holds doctorates in both engineering and medicine and has co-authored four text books on biomechanics. He also has a D.Sc. degree for his work on trauma science. He has published over 100 scientific papers, and is a Fellow of the Institution of Mechanical Engineers and a Chartered Engineer. He is also a Fellow of the Faculty of Sports and Exercise Medicine (UK). His main research areas involve trauma and sports biomechanics.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.