1 Introduction

The detection and diagnosis of brain tumor from MRI is crucial to decrease the rate of casualties. Brain tumor is difficult to cure, because the brain has a very complex structure and the tissues are interconnected with each other in a complicated manner. Despite many existing approaches, robust and efficient segmentation of brain tumor is still an important and challenging task. Tumor segmentation and classification is a challenging task, because tumors vary in shape, appearance and location. It is hard to fully segment and classify brain tumor from mono-modality scans, because of its complicated structure. MRI provides the ability to capture multiple images known as multi-modality images, which can provide the detailed structure of brain to efficiently classify the brain tumor [1]. Figure 1 shows different MRI modalities of brain.

Fig. 1
figure 1

Brain multi-modality MRI images showing a T1, b T2, c T1-Contrast (T1C) and d fluid-attenuated inversion recovery (FLAIR)

Brain tumor segmentation and detailed classification based on MRI images has received considerable interest over last decades. It has been explored in many studies using uni-modality MRI. Recently, researchers have explored multi-modality MRI to increase the accuracy of tumor segmentation and classification.

Machine learning and edge/region-based approaches have been used with multi-modality (T1, T2, T1C and FLAIR) MRI [2]. The machine learning techniques often rely on voxel intensities and texture features. Individual voxel is classified on the basis of feature vector [2]. Intensity, intensity difference, neighborhood and other texture features have been explored on benchmark dataset [3]. To the best of our knowledge, wavelet-based features have not yet been explored on multi-modality MRI brain tumor dataset. In this paper, we investigate wavelet texture features along with various machine learning algorithms.

In this work, we used multi-modality images to classify the brain tumor. This work makes the following contributions:

  1. 1.

    extracting wavelet-based texture features to predict tumor labels and

  2. 2.

    exploring supervised classifiers for brain tumor classification.

This paper is organized as follows: Sect. 2 reviews the related work; Sect. 3 discusses the proposed algorithm, while Sect. 4 presents the results, leading to conclusion in Sect. 5.

2 Literature review

Brain tumor segmentation is a challenging process because tumor exhibits inhomogeneous intensities and unclear boundaries. Intensity normalization or bias field correction is often applied to balance the effect of magnetic field inhomogeneity [1]. Intensities, neighborhood and texture are common features used in various studies [1,2,3]. Various machine learning and edge/region-based techniques used in segmentation are summarized in Table 1, where we present a concise review of the previous work. Few techniques are fully automatic, while remaining need user involvement.

Table 1 Brain tumor extraction and classification by machine learning or edge/region-based algorithm

Fluid vector flow (FVF) [4] is introduced to address the problem of unsatisfactory capture range and poor convergence for concavities. Harati et al. [5] demonstrated an improved fuzzy connectedness (FC) algorithm, where seed points are selected automatically to segment the tumor region. Saha et al. [6] proposed a fast novel method to locate the bounding box around tumor or edema using Bhattacharya coefficient [7]. In their proposed clustering technique axial view of brain image is divided into left and right halves, and then a rectangle is used to compare the corresponding regions of left half with right half to find the most dissimilar region within the rectangle. Zhu et al. [8] proposed a semiautomatic brain tumor segmentation method, where initial segmentation is performed through ITK-Snap tool. Voxel-based segmentation and deformable shape-based segmentation are combined into the software pipeline. Sachdeva et al. [9] used texture information with intensity in active contour model (ACM) to overcome the issue observed in previous techniques like FVF, boundary vector flow (BVF) and gradient vector flow (GVF). In previous techniques selection of false edges or false seeds corresponds to preconvergence problem and selection of weak edges leads to over-segmentation due to the edema around the tumor. Rexilius et al. [10] proposed a new region growing method for segmentation of brain tumor. Probabilistic model is used to achieve the initial segmentation, which is further refined by region growing to give better segmentation results. Global affine and non-rigid registration method is used to register multi-spectral histograms gathered from patients’ data with a reference histogram.

Corso et al. [11] used a top-down approach to distribute the product over generative model. Later, sparse graph is given as input to graph cut method, where each edge uses features to find similarity between neighboring nodes having the affinity. Segmentation by weighted aggregation (SWA) is used to provide the multi-level segmentation of data. Ruan et al. [12] proposed a supervised machine learning technique to track the tumor volume. The complete process is categorized into two main steps. In the first step to make it efficient and reduce computational time, only T1 modality is used to identify the abnormal area. In the second step, the abnormal area is extracted from all modalities and fused to segment the tumor. Irfan et al. [13] introduced a technique in which brain images are separated from non-brain part, and then ROI is used with the saliency information to bind the search of normalization cut (N-Cut) [14] method. Saliency information is the combination of multi-scale contrast and image curvature points. Multi-scale contrast image is acknowledged when image is decomposed at multiple scales by using Gaussian pyramid (GP), and Euclidean distance is calculated with neighboring pixels at those scales.

Automatic segmentation is performed using the random forest (RF) [3], where features include MR sequence intensities, neighborhood information, context information and texture. Post-processing is performed for the sake of good results. Zhao et al. [15] used Markov random field (MRF) model on supervoxels to automatically segment tumor. ACM combines the edge-based and region-based techniques [16], where user draws region of interest (ROI) in different images on the basis of tumor type and grade.

In machine learning availability of benchmark data became important in comparing different algorithms. Recently, this idea has also become popular in the domain of medical image analysis. Sometime challenge word is used instead of benchmark that shares the common characteristic in a sense that different researchers used their own algorithms to optimize on a training dataset provided by the organizers of event and then apply their algorithm to a common, independent test dataset. The benchmark idea is different from other published comparisons in a sense that in benchmark each group of researchers uses the same dataset for their algorithm. The BraTS benchmark was established in 2012, and first event was held in the same year [2]. Dataset consists of real and simulated images.

Various studies presented different accuracy measures and dataset as shown in Table 1; therefore, it is difficult to compare them and draw conclusion about the best technique. Furthermore, in previous studies, the value of Dice and Jaccard was not high enough and there is room for further improvement in classification accuracy; therefore, we explored wavelet-based texture features which were not explored before on MICCAI BraTS dataset.

3 Proposed method

The proposed algorithm uses MICCAI BraTS dataset and the main flow of our proposed technique is presented in Fig. 2, with further details presented in subsection.

Fig. 2
figure 2

Block diagram of proposed method takes multi-modality MRI as input and gives tumor labels as output

3.1 Preprocessing

The BraTS dataset has four modalities of MRI: T1, T2, T1C and FLAIR. Each modality scan is rigidly co-registered with T1C modality to homogenize data, because T1C has the highest spatial resolution in most cases. Linear interpolator is used to resample all the images to 1-mm isotropic resolution in axial orientation. Images are skull-stripped with expert annotation [2]. All the images are visualized through ITK-Snap [17], while histogram matching is performed with Slicer3D [18] to enhance the image contrast by choosing a high-contrast image as the reference.

The next preprocessing step is to determine the bounding box around the tumor region. Our adapted technique for locating bounding box consists of the following steps:

  1. 1.

    Remove complete blank slices from ground truth, remaining slices contain tumor part.

  2. 2.

    Create a mask and use it to locate bounding box in ground truth.

  3. 3.

    Use the above bounding box to crop multi-modality images.

3.2 Feature extraction

The proposed feature extraction includes four types of features: (1) intensity, (2) intensity difference, (3) neighborhood information and (4) wavelet-based texture features.

Intensity features are shown Fig. 1. Intensity difference is the differences between the above modalities, and we used three prominent intensity difference features that represent the global characteristics of brain tissues [19] as shown in Fig. 3.

Fig. 3
figure 3

Intensity difference features: a T1C-FLAIR, b T1-T1C, c T2-T1C

Neighborhood information features include mean, median and range of 3D neighbors centered at voxel being considered. The isotropic neighborhood size of 3, 9, 15 and 19 mm was used in 3D as these were found to be appropriate for mean and range filters [3], while we used median filter with neighborhood size 3 mm.

The novelty of the proposed approach is to extract wavelet features, which has not been explored and applied on MICCAI BraTS dataset. Wavelet has the property of multi-resolution analysis, where we can decompose and visualize the images at different scales [20]. Discrete wavelet transform can be defined as:

$$W_{j,k} \left( t \right) = 2^{{\frac{ - j}{2}}} \vartheta \left( {2^{ - j} t - k} \right)$$
(1)

where \(j,k \in Z\), j controls the dilation, k controls the translation of wavelet function, and \(\vartheta \left( t \right)\) is the mother wavelet. Performing scaling and shifting on initial wavelet and convolving it with the original image is a part of wavelet decomposition. It has the property to reconstruct the original image without loss of information [21]. Wavelet-based texture segmentation is compared with simple single resolution texture spectrum, co-occurrences and local linear transforms on Brodatz dataset, where wavelet-based texture segmentation performed better than other approaches [22]. Wavelet has been used on brain, liver and kidney 3D images to produce accurate reconstruction from decomposed subimages [23].

For 3D wavelet decomposition, the image volume is initially convolved in x dimension with low-pass filter to produce approximation subband (L) and with high-pass filter to produce detail subband (H). In the same way, the approximation and detail subbands are further convolved in y dimension and z dimension, respectively, with both the low-pass and high-pass filters. As a result, eight subbands: LLL, LLH, LHL, HLL, LHH, HLH, HHL and HHH [21] are obtained, where L indicates low-pass-filtered subband and H indicates high-pass-filtered subband. Level 2 decomposition is achieved by considering the LLL subband as the main image and decomposing with the same process as above.

Block diagram of wavelet-based feature extraction is shown in Fig. 4. In wavelet-based feature extraction, an intensity difference image (from T1C, T1C-FLAIR, T1C-T1 or T2-T1C) is given as input for 3D wavelet decomposition. Input image is decomposed into subbands, and subbands containing useful information are then selected based on their discriminatory ability assessed by visual analysis. Feature images are reconstructed from selected subband, and Gaussian filter is applied after absolute function to make the features more prominent. We performed decomposition at second level, because subbands of third level were not found to be useful in our experiments. Moreover, the subbands at third level of decomposition are at too small scale to contain sufficiently useful discriminatory information. We tried various filter families for wavelet decomposition including Daubechies4, Symlets4 and Symlets8, while Symlets8 was selected due to superior performance.

Fig. 4
figure 4

Block diagram of wavelet-based feature extraction, while input to wavelet decomposition can be intensity differences or T1C modality and output represents the feature images [24]

Wavelet reconstruction is a process in which feature images are constructed from each subband, and useful feature images are then selected based on discriminatory information present in visual analysis. We applied absolute function and Gaussian smoothing to make the edges of feature images more prominent [24] as shown in Fig. 5.

Fig. 5
figure 5

Selected feature images: a HHH1, b HHL1, c HLH1, d LHH1, e HHH2, f HHL2, g HLH2, h LHH2, where H denotes high frequency, L denotes low frequency and the right most number represents the level of decomposition

In this work, we extracted intensity, intensity differences, neighborhood information and wavelet-based texture features. In the next section, we will use these features to perform supervised classification.

3.3 Classification

Supervised classification is a machine learning approach in which training data are used to construct the model and test data are used to evaluate the constructed model on unseen data to measure the performance of algorithm. There are a number of classifiers that exist to classify data, and below we will discuss the classifiers which we have explored in this work.

The kNN (k-nearest neighbor) is a lazy learning technique, which calculates the Euclidean distance from all the points. The classification label is then assigned based upon majority voting as per ‘k’ nearest neighbors.

Random forest (RF) is a combination of decision trees. Each tree in ensemble is trained on randomly sampled data with replacement from training vector during the phase of training. Multiple trees are trained to increase the correlation and reduce the variance between trees. In test phase, vote of each tree is considered and majority vote is given to the unseen data. RF is useful because it gives internal estimates of error and variable importance, and also it can be easily parallelized [25]. RF has become a major data analysis tool within a short period of time, and it became popular because it can be applied to nonlinear and higher-order dataset [26].

AdaBoostM2 (adaptive boosting) [27] is the enhanced version of AdaBoostM1 [27], which is used for multi-class classification. It is a boosting algorithm, where many weak learners are combined to make a powerful algorithm and instances are reweighted rather than resampled (in bagging) [25].

Random under sampling (RusBoost) is suitable for classifying imbalanced data when instances of one class dominate many times than the other. Machine learning techniques fail to efficiently classify skewed data, but RusBoost solved the problem by combining sampling and boosting. We explored these classification algorithms, and the results are reported in the next section.

4 Results

In this section, we present the results and compare them with previous work on the BraTS dataset of real patients containing 20 high-grade (HG) and 10 low-grade (LG) subjects. Three measures are used for quantitative evaluation, and visual segmentation results are also shown. The results are obtained on HP-probook 4540, Core i5, 2.5 GHz, 8 GB RAM using MATLAB 2013a, and it takes about 2 min to test a new patient.

4.1 Out of bag error (ooBError)

OoBError is the mean-squared error or the misclassification error for out of bag observations in the training. There is no need of separate test set of cross-validation to get the unbiased estimated error for test cases, because ooBError is calculated internally during RF model creation phase. Figure 6 shows that ooBError is lowest when 25 trees are used.

Fig. 6
figure 6

Graph shows relationship between the number of trees and ooBError. The ooBError decreases rapidly till the number of trees equals to 25 and then it becomes steady

4.2 Evaluation measures

We used various evaluation measures to assess the results, and these measures are described below. The Dice coefficient is the similarity/overlap between two images [28]. It is graphically explained in Fig. 7:

$${\text{Dice}}\left( {P,T} \right) = \frac{{2\left| {P_{1} \cap T_{1} } \right|}}{{\left| {P_{1} } \right| + \left| {T_{1} } \right|}}$$
(2)

where \(\cap\) is the logical AND operator, | | is the size of the set (i.e., the number of voxels belonging to it). \(P_{1}\) and \(T_{1}\) represent the numbers of voxels belonging to algorithm’s prediction and ground truth, respectively. The Dice score normalizes the number of true positives to the average size of predicted and ground truth-segmented area. It also gives us the voxel wise overlap between the result and ground truth [2].

Fig. 7
figure 7

Dice score is calculated by deriving formula from the diagram. \(T_{1}\) is the ground truth lesion, and \(T_{0}\) is the area outside \(T_{1}\) within the brain. \(P_{1}\) is the algorithm’s predicted lesion, and \(P_{0}\) is the algorithm’s predicted area outside \(P_{1}\) within the brain. Overlapped area between \(T_{1}\) and \(P_{1}\) gives us the true positive [2]

The Jaccard coefficient measures the similarity between two images and can be defined as the size of intersection divided by the size of union of two sets [29]. Jaccard coefficient is also known as Jaccard index and can be measured as:

$${\text{Jaccard}}\left( {P,T} \right) = \frac{{P_{1} \cap T_{1} }}{{P_{1} \cup T_{1} }}$$
(3)

Sensitivity is true positive rate, it is prioritized when disease is serious, and we want to identify all the possible true cases. It can be measured as:

$${\text{Sensitivity}}\left( {P,T} \right) = \frac{{P_{1} \cap T_{1} }}{{T_{1} }}$$
(4)

Specificity is true negative rate, it is prioritized when treatment is dreadful, and we only want to treat those which are surely having disease. It can be measured:

$${\text{Specificity}}\left( {P,T} \right) = \frac{{P_{0} \cap T_{0} }}{{T_{0} }}$$
(5)

4.3 Hierarchical classification

Each voxel is initially classified as one of the five target classes [background (0), necrosis (1), edema (2), non-enhancing (3) and enhancing (4)]. Subsequently, tumor regions are computed hierarchically from these class labels. Our classification system extracts the following three tumor regions in a hierarchical manner:

  1. 1.

    Complete Tumor: This region is the combination of four classes (1) + (2) + (3) + (4), which are separated from class (0).

  2. 2.

    Core Tumor: In this region, we exclude edema (2) from complete tumor identified in step above.

  3. 3.

    Enhancing Tumor: Subsequent to core tumor classification, enhancing tumor (4) is extracted from necrosis and non-enhancing (1) + (3).

For our initial experiments, in order to identify experimental choices, we performed leave-one-out cross-validation on a subset of BraTS data (four real HG patients) with the assumption that the identified choices will perform similar on complete BraTS data. The initial experiments on a subset of data were conducted for computational reasons. Table 2 presents the comparison between different types of features and shows that wavelet features are helpful in improving Dice coefficient. We utilized all the extracted features to compare different classifiers as shown in Table 3.

Table 2 Classification is performed by varying the type of features to analyze the importance of extracted features
Table 3 Comparison of RF, KNN, AdaBoostM2 and RusBoost (leave-one-out cross-validation) for brain tumor classification

4.4 Quantitative evaluation

Table 3 shows that RF is performing best among other classifiers for the extracted features, therefore we used RF classifier, and the quantitative results of the proposed method are compared with the results presented by the MICCAI BraTS challenge in Table 4. Table 5 shows the detail results of proposed methodology.

Table 4 Comparison of Dice coefficient on BraTS dataset [2], for the high-grade (HG) and low-grade (LG) subjects
Table 5 Average results (by leave-one-out cross-validation) of proposed method by measuring different metrics on high-grade (HG) and low-grade (LG) data

4.5 Visual results

Visual results of the work are shown in Fig. 8, indicating the success of brain tumor classification with the proposed method.

Fig. 8
figure 8

Segmentation results using proposed method. Each row represents a distinct subject. a T1, b T2, c T1C, d FLAIR, e ground truth and f proposed method’s results

5 Discussion

We proposed an algorithm for brain tumor classification. The proposed algorithm used MICCAI BraTS data and relies on intensity-related features and wavelet texture features. The algorithm is applied on BraTS challenge training dataset, and it gives better results than the state-of-the-art methods as shown in Table 4.

In feature extraction process, we calculated intensity, intensity difference and neighborhood information features [3] and the wavelet texture features. For wavelet features, we initially decomposed the multi-modality images into third level and visualized all the feature images produced by these. We restrict wavelet decomposition at second level after visualization, because the feature images at third level are too small and not much useful for us. We analyzed all the feature images at first and second level and selected only those, which contain high-frequency components. Future work will focus on improving subband selection process to make it more automatic rather than based on visualization and to test the algorithm on larger dataset to verify robustness.

We utilized all the extracted features with different classifiers (kNN, RF, AdaBoostM2 and RusBoost) as in Table 3 and observed that RF is better for our extracted features to classify brain tumor. Leave-one-out cross-validation is performed separately for HG and LG on real dataset. We further performed detailed classification that classifies the tumor into three different regions: complete tumor, core tumor and enhancing tumor. Proposed technique gives comparable or favorable results with other existing techniques.

6 Conclusion

This paper presented an algorithm to hierarchically classify the tumor into three regions: whole tumor, core tumor and enhancing tumor. Intensity, intensity difference, neighborhood information and wavelet features are extracted and utilized on multi-modality MRI scans with various classifiers. The use of wavelet-based texture features with RF classifier has increased the classification accuracy as evident by quantitative results of our proposed method which are comparable or higher than the state of the art.