Keywords

1 Introduction

The ever growing number of medical imaging devices cannot be followed by the number of human experts who are able to reliably evaluate the image records. This is causing an intensifying need for automated algorithms that can filter out the surely negative cases and recommend the suspected positives to be investigated by the human experts. The most important requirement for such automated algorithms is to minimize the number of false negatives, which means that they have to be sensitive to any sort of lesions in the tissues.

Magnetic resonance imaging (MRI) is a frequently used technique in the brain tumor detection and segmentation problem, because of its high contrast and relatively good resolution. However, MRI has a serious drawback: the numerical values in its records do not directly reflect the imaged tissues. In order to correctly interpret the observed images, it is necessary to adapt them to the context, which is usually performed via histogram normalization. Without this step, comparing two intensity values from two different MRI records would be like comparing the water amount in two bottles by checking only the depth of the water in them and ignoring the shape of the bottles.

Several solutions have been proposed to normalize or standardize the histograms of MRI records [1,2,3,4,5]. However, none of them were designed to tackle with focal lesions (tumors, gliomas) that might be present. Some brain tumors grow to 20–30% of the brain volume, which strongly distorts the histogram of any data channel of the MRI histograms. Luckily, normal and tumor tissues look differently in some data channels, and thus we are able to identify the presence of tumors. Normalizing the histograms in batch mode, as it is done by the most popular technique proposed by Nyúl et al. [1] (referred to as method A1 in the following), and expecting them to look similar whether they contain tumor or not, is prone to damage the segmentation quality. A1 produces a two-step transformation of intensities using some predefined intensity percentiles as landmark points. Several recent studies report using A1, without giving details of its parametrization [6,7,8,9,10,11,12,13,14,15,16]. Few studies indicate the number of landmark points involved: Soltaninejad et al. [17] mentioned using 12 landmarks, while Pinto et al. [18] seem to be using the S1 setting of the method A1, see details in Sect. 3.1. Tustison et al. [19] remarked that a simple linear transformation based method can provide slightly better accuracy than A1, without giving details of their method. Such simple linear transforms were applied in [20,21,22], without comparing their effect to other histogram normalization methods.

This paper intends to investigate how suitable the above mentioned most popular histogram normalization method is at preprocessing MRI data in a brain tumor segmentation problem. In this order, three sets of algorithms are compared:

  1. 1.

    Method A1, with several settings schemes that affects the number and position of landmark points;

  2. 2.

    Method A2, which in fact is method A1 with landmark points defined by the fuzzy c-means clustering algorithm [23];

  3. 3.

    Method A3, a simple linear transform, with a single parameter, that generalizes the method employed in [20,21,22].

The rest of the paper is structured as follows: Sect. 2 presents the necessary details of background works, Sect. 3 gives details of the compared algorithms, Sect. 4 provides a detailed analysis of the obtained results, while Sect. 5 concludes the study.

2 Background

2.1 Data

This study relies on the 54 low-grade glioma (LGG) volumes of the BraTS 2016 train dataset [24, 25]. All MRI records contain four data channels (T1, T2, T1C, FLAIR), each with 155 slices of \(240 \times 240\) isovolumetric pixels representing one cubic millimeter of brain tissue. Records contain approximately 1.5 million brain pixels. The ground truth provided by human experts is available for each record, which stands at the basis of training and testing machine learning solution deployed in the segmentation problem.

2.2 Framework

In order to evaluate various histogram normalization techniques, a framework was built that can deploy ensemble learning methods in a tumor segmentation problem based on MRI data. The block diagram of the framework is shown in Fig. 1.

In this study we worked with ensembles of binary decision trees (BDT). Each BDT was trained to separate negative and positive pixels based on the feature vectors of 10000 pixels that were randomly selected from the train data set. During the training process, BDTs were allowed to place nodes at any depth that was necessary. Most BDTs grew to a maximum depth between 20 and 25. On the other hand, when decision were made by the trained BDTs, the average depth of the leaf making the decision was around depth 8.

The tested histogram normalization methods (see Sect. 3) were used as the first step of the data processing, as indicated in Fig. 1. The output of each was fed to the further steps, and statistical evaluation results collected for comparison.

Fig. 1.
figure 1

Block diagram of the evaluation framework.

2.3 The A1 Method

The histogram normalization method proposed by Nyúl et al. [1] works as follows:

  1. 1.

    The previously defined target intensity interval is denoted by \([\alpha ,\beta ]\).

  2. 2.

    A previously defined set of MRI records \(\mathcal {R}\) is involved in the process, the number of records is denoted by r. The histogram of each record is extracted.

  3. 3.

    The set of landmark points is defined, for example \(\varLambda = \{p_\mathrm {low}=1\%,p_{L1}=10\%,p_{L2}=20\%,\dots ,p_{L9}=90\%,p_\mathrm {high}=99\%\}\). Let us denote the number of inner landmark points by \(\lambda \) (in the previous example \(\lambda = 9\)).

  4. 4.

    For all MRI records with index i, \(i=1\dots r\), the intensity values corresponding to the landmark points defined in \(\varLambda \) are identified and denoted by \(y_\mathrm {low}^{(i)},y_{L1}^{(i)},y_{L2}^{(i)},\dots ,y_{L\lambda }^{(i)},y_\mathrm {high}^{(i)}\), respectively.

  5. 5.

    A first transformation step is performed: a linear transformation is designed such a way that maps \(y_{\mathrm {low}}^{(i)}\) to \(\overline{y}_{\mathrm {low}}^{(i)} = \alpha \), \(y_\mathrm {high}^{(i)}\) to \(\overline{y}_\mathrm {high}^{(i)} = \beta \), and applies this linear transform to all intensity values situated between \(y_\mathrm {low}^{(i)}\) and \(y_\mathrm {high}^{(i)}\) in the original histogram. The two tails of the histogram is cut, meaning that intensity values below \(y_\mathrm {low}^{(i)}\) are transformed to \(\alpha \), and intensity values above \(y_\mathrm {high}^{(i)}\) are transformed to \(\beta \). For any \(j=1\dots \lambda \), \(y_{Lj}^{(i)}\) is transformed to \(\overline{y}_{Lj}^{(i)}\).

  6. 6.

    Target intensity values for each inner landmark point with index j (\(j=1\dots \lambda \)) is computed next. These values are the same for all MRI records:

    $$\begin{aligned} \widetilde{y}_{Lj} = \frac{1}{r}\sum \limits _{i=1}^{r} \overline{y}_{Lj}^{(i)}. \end{aligned}$$
    (1)
  7. 7.

    The target intensity values for the two extremes are: \(\widetilde{y}_\mathrm {low}=\alpha \) and \(\widetilde{y}_\mathrm {high}=\beta \).

  8. 8.

    A final transformation is applied to the first transformed intensities such a way, that \(\overline{y}_\mathrm {low}^{(i)}\) is mapped onto \(\widetilde{y}_\mathrm {low}\), \(\overline{y}_\mathrm {high}^{(i)}\) is mapped onto \(\widetilde{y}_\mathrm {high}\), and any \(\overline{y}_{Lj}^{(i)}\) is mapped onto \(\widetilde{y}_{Lj}\) for any \(j=1\dots \lambda \). Further on, for any \(j=0\dots \lambda \), any intensity value \(\overline{y}^{(i)}\in [\overline{y}_{Lj}^{(i)},\overline{y}_{L,j+1}^{(i)}]\) (where \(\overline{y}_{L0}^{(i)}\) is an alias for \(\overline{y}_\mathrm {low}^{(i)}\), and \(\overline{y}_{L,\lambda +1}^{(i)}\) is an alias for \(\overline{y}_\mathrm {high}^{(i)}\)) is piecewise linearly transformed to a value \(\widetilde{y}\) situated in the interval \([\widetilde{y}_{Lj},\widetilde{y}_{L,j+1}]\):

    $$\begin{aligned} \widetilde{y} = \widetilde{y}_{Lj} + (\widetilde{y}_{L,j+1}-\widetilde{y}_{Lj})\times \frac{\overline{y}^{(i)} - \overline{y}_{Lj}^{(i)}}{\overline{y}_{L,j+1}^{(i)} - \overline{y}_{Lj}^{(i)}}. \end{aligned}$$
    (2)

The algorithm is applied to each data channel separately.

3 Methods

Three approaches are compared in this study, each involving several parameter settings. The goal is to establish, which algorithm produces the best final segmentation accuracy and what settings are needed for that. The three approaches are presented in the following subsections.

3.1 Method A1 with Parameter Setting Schemes

The first approach denoted by A1 applies the algorithm presented in Sect. 2.3. Seven different parameter setting schemes were defined, they are denoted by \(\mathrm{S1} \ldots \mathrm{S7}\), and listed in Table 1. Each setting was involved in testing with values of \(p_\mathrm {low}\) varying between 1% and 5% in steps of 0.5%. The value of \(p_\mathrm {high}\) varied together with \(p_\mathrm {low}\) such a way that the equality \(p_\mathrm {high}+p_\mathrm {low}=100\%\) was held.

Table 1. Various settings for the Approach A1

3.2 Method A2: Landmarks Established by Fuzzy c-Means

The second approach denoted by A2 employs a very similar mechanism as A1, but the landmark points are established by the use of the fuzzy c-means algorithm. The steps of the algorithm are presented in the following:

  1. 1.

    The previously defined target intensity interval is denoted by \([\alpha ,\beta ]\). The number of inner landmark points is set as \(\lambda \ge 2\). In this study we evaluated cases with \(2\le \lambda \le 7\).

  2. 2.

    A previously defined set of MRI records \(\mathcal {R}\) is involved in the process, the number of records is denoted by r. The histogram of each record is extracted.

  3. 3.

    The set of landmark points is \(\varLambda = \{p_\mathrm {low},p_{L1},p_{L2},\dots ,p_{L,\lambda },p_\mathrm {high}\}\), but only \(p_\mathrm {low}\) and \(p_\mathrm {high}\) have predefined fixed values.

  4. 4.

    A first transformation step is performed: a linear transformation is designed such a way that maps \(y_{\mathrm {low}}^{(i)}\) to \(\overline{y}_{\mathrm {low}}^{(i)} = \alpha \), \(y_\mathrm {high}^{(i)}\) to \(\overline{y}_\mathrm {high}^{(i)} = \beta \), and applies this linear transform to all intensity values situated between \(y_\mathrm {low}^{(i)}\) and \(y_\mathrm {high}^{(i)}\) in the original histogram. The two tails of the histogram is cut, meaning that intensity values below \(y_\mathrm {low}^{(i)}\) are transformed to \(\alpha \), and intensity values above \(y_\mathrm {high}^{(i)}\) are transformed to \(\beta \). For any \(j=1\dots \lambda \), \(y_{Lj}^{(i)}\) is transformed to \(\overline{y}_{Lj}^{(i)}\).

  5. 5.

    For all MRI records with index i, \(i=1\dots r\), the transformed intensity values undergo histogram-based quick fuzzy c-means clustering with \(c=\lambda \) clusters. The obtained cluster prototypes sorted in increasing order \(v_1,v_2,\dots v_\lambda \) are then assigned as dynamically established landmark points: \(\overline{y}_{Lj}^{(i)}=v_j\,\forall j=1\dots \lambda \).

  6. 6.

    Target intensity values \(\widetilde{y}_{Lj}\) for each inner landmark point with index j (\(j=1\dots \lambda \)) is computed next, using Eq. (1). These values are the same for all MRI records.

  7. 7.

    The target intensity values for the two extremes are: \(\widetilde{y}_\mathrm {low}=\alpha \) and \(\widetilde{y}_\mathrm {high}=\beta \).

  8. 8.

    The final transformation is applied the same way as in the original A1 algorithm, presented in Sect. 2.3.

The algorithm is applied to each data channel separately.

3.3 Method A3: Linear Transform with One Parameter

The third approach denoted by A3 is a generalization of the technique proposed in our previous paper [21]. This method uses a single linear transformation, whose coefficients depend on the histogram of the original MRI volume. In contrast with the previous two approaches, the normalization of any MRI record does not depend on other MRI records.

  1. 1.

    The previously defined target intensity interval is denoted by \([\alpha ,\beta ]\). The algorithm uses a parameter q which controls the compactness of the final histogram.

  2. 2.

    The histogram of the current MRI record is extracted. The 25-percentile and 75-percentile intensity values are identified, and denoted by \(y_{25}\) and \(y_{75}\).

  3. 3.

    The target intensities for the 25-percentile and 75-percentile intensity values are established using the formulas

    $$\begin{aligned} \widetilde{y}_{25}=\frac{1}{2}\left[ (\beta +\alpha )-\frac{\beta -\alpha }{q}\right] \quad \mathrm {and} \quad \widetilde{y}_{75}=\frac{1}{2}\left[ (\beta +\alpha )+\frac{\beta -\alpha }{q}\right] . \end{aligned}$$
    (3)
  4. 4.

    The coefficients of the linear transform \(y\rightarrow ay+b\) are extracted such a way, that \(y_{25}\) and \(y_{75}\) are transformed to \(\widetilde{y}_{25}\) and \(\widetilde{y}_{75}\), respectively, using the formulas

    $$\begin{aligned} a=\frac{\beta -\alpha }{q(y_{75}-y_{25})} \quad \mathrm {and} \quad b=\widetilde{y}_{25}-\frac{(\beta -\alpha )y_{25}}{q(y_{75}-y_{25})}. \end{aligned}$$
    (4)
  5. 5.

    Any intensity y from the input MRI volume becomes

    $$\begin{aligned} {\widetilde{y}=\left\{ \begin{array}{ll}\alpha \quad \quad \quad &{} \mathrm {if}\, ay+b<\alpha \\ ay+b &{} \mathrm {if}\, \alpha \le ay+b \le \beta \\ \beta &{} \mathrm {if}\, ay+b>\beta \end{array}\right. }. \end{aligned}$$
    (5)

The algorithm is applied to each data channel separately. In our previous works [20,21,22], this approach was used with parameter setting \(q=5\).

Table 2. Overall Dice scores obtained using Approach A1 using various settings
Table 3. Overall Dice scores obtained using Approach A2 using various settings

4 Results and Discussion

Each of the three algorithms were tested with various settings using the same evaluation framework, having the target intensity interval bounded by \(\alpha =200\) and \(\beta =1200\), corresponding to an approximately 10-bit resolution. The 54 LGG volumes of the BraTS 2016 data set underwent a ten-fold cross validation using the BDT ensemble based classifier algorithm described in Sect. 2. Each ensemble consisted of 125 BDTs, each trained with 10000 randomly selected feature vectors from the train data, out of which 92% were negatives and 8% positives. From 104 generated features (for each of the 4 observed channels: minimum, maximum and average extracted from \(3\times 3\times 3\) neighborhood; average and median extracted from planar neighborhoods of size ranging from \(3\times 3\) to \(11\times 11\); four directional gradients and eight directional Gabor wavelet values) the 13 most relevant features (minimum, maximum and average of T2 and FLAIR, maximum and average of T1C, and minimum of T1 from \(3\times 3 \times 3\) neighborhood; average of T1C, T2 and FLAIR from \(11\times 11\) neighborhood; average of FLAIR from \(3\times 3\) neighborhood) were included into the feature vector, details are presented in our previous paper [22]. The outcome of the classification produced by the ensemble underwent a post-processing that relabeled each pixel according to the neighbors of the pixel. Those pixels were declared final positives, which had at least one third of its neighbors declared positive by the ensemble. The main evaluation criterion is the Dice score (DS), which is defined as \(\mathrm {DS}=2\mathrm {TP}/(2\mathrm {TP}+\mathrm {FP}+\mathrm {FN})\), where \(\mathrm {TP}\), \(\mathrm {FP}\), and \(\mathrm {FN}\) represent the number of true positives, false positives, and false negatives, respectively. Average Dice scores for each MRI record were established after the ten-fold cross-validation. Finally, the overall Dice score was computed for each approach and setting, based on all pixels from all volumes. Results are exhibited in Tables 2, 3 and 4.

Table 4. Overall Dice scores obtained using Approach A3 using various settings
Fig. 2.
figure 2

Dice scores obtained on individual LGG records, the best performance of the three approaches plotted one against another: (a) A2 vs. A1; (b) A3 vs. A1.

The best achieved overall Dice score (ODS) is 82.395%. Most of the evaluated approaches and settings led to ODS values over 80%. The classical A1 approach hardly achieved 81.5%, with its best setting that used the landmark set \(\{p_\mathrm {low}=2\%, 25\%, 50\%, 75\%, p_\mathrm {high}=98\%\}\), where the middle landmark point is virtually optional. A larger number of landmarks, as used for example in [17, 18], in our studies led to ODS around 80.5%, which is well below optimal.

The A2 approach achieved best ODS values around 82%, when using 2 or 3 inner landmarks, \(p_\mathrm {low}\) ranging between 1.5% and 2.5%. The accuracy is finer than in case of approach A1, while the best scenario is quite similar.

The A3 approach has a wide interval of its parameter, where the algorithm scores ODS values above 82%. The best accuracy was achieved at \(q=4.5\), which means that in each data channel of the MRI records, all original intensity values are subject to linear transformation into the target interval [200, 1200] such a way, that the 25-percentile is mapped to 578, the 75-percentile to 822, and the tails of the transformed histogram is cut at 200 and 1200. The normalization of any histograms occurs independently, it does not depend on the histograms of other records or other data channels.

Figure 2 exhibits the comparison of the three approaches, when applied to individual MRI volumes. Each approach is represented with its overall best setting. This figure also shows that A3 and A2 can perform slightly better than A1, but the slight superiority comes in average only, because the segmentation accuracy of individual MRI records can be either better of worse, with virtually same probability. Tests have confirmed the observation of Tustison et al. [19], who remarked that a well designed simple linear transformation performs better than previous algorithms like A1, in such tumor segmentation problems. Further tests involving more data, more algorithms, and further quality indicators could provide stronger evidence of this superiority. Our results do not mean that A3 leads to better accuracy than the frequently used A1 in all segmentation problems. But when the goal is tumor detection, it is recommendable to apply histogram normalization via approach A3.

5 Conclusions

This study investigated the effect of various histogram normalization methods upon the final accuracy in an MRI data based brain tumor segmentation problem. Two approaches were proposed and compared to the most frequently used and most cited such algorithm. Tests have revealed a slight superiority of both proposed algorithms, compared to the previous one.