Keywords

1 Introduction

Automatic segmentation of medical images has become a crucial task due to the huge amount of data produced by imaging devices. Many popular tools as FSL [42] and Freesurfer [11] are dedicated to this aim. There are several techniques to achieve the segmentation. We can broadly classify them into thresholding methods [21, 28, 43], clustering methods [7, 31, 39], edge detection methods [5, 30, 35], region-growing methods [22, 34], watersheds methods [3, 24], model-based methods [6, 20, 25, 38] and Hidden Markov Random Field methods [1, 14, 14,15,16,17,18,19, 29, 42]. Threshold-based methods are the simplest ones that require only one pass through the pixels. They begin with the creation of an image histogram. Then, thresholds are used to separate the different image classes. For example, to segment an image into two classes, foreground and background, one threshold is necessary. The disadvantage of threshold-based techniques is the sensitivity to noise. Region-based methods assemble neighboring pixels of the image in non-overlapping regions according to some homogeneity criterion (gray level, color, texture, shape and model). We distinguish two categories, region-growing methods and split-merge methods. They are effective when the neighboring pixels within one region have similar characteristics. In model-based segmentation, a model is built for a specific anatomic structure by incorporating prior information on shape, location and orientation. The presence of noise degrades the segmentation quality. Therefore, noise removal phase is generally an essential prior. Hidden Markov Random Field (HMRF) [12] provides an elegant way to model the segmentation problem. It is based on the MAP (Maximum A Posteriori) criterion [40]. MAP estimation leads to the minimization of an objective function [37]. Therefore, optimization techniques are necessary to compute a solution. Conjugate Gradient algorithm [26, 33, 36] is one of the most popular optimization methods.

This paper presents an unsupervised segmentation method based on the combination of Hidden Markov Random Field model and Conjugate Gradient algorithm. This method referred to as HMRF-CG, does not require preprocessing, feature extraction, training and learning. Brain MR image segmentation has attracted a particular attention in medical imaging. Thus, our tests focus on BrainWebFootnote 1 [8] and IBSRFootnote 2 images where the ground truth is known. Segmentation quality is evaluated using Dice Coefficient (DC) [9] criterion. DC measures how much the segmentation result is close to the ground truth. This paper is organized as follows. We begin by introducing the concept of Hidden Markov Field in Sect. 2. Section 3 is devoted to the well known Conjugate Gradient algorithm. Section 4 is dedicated to the experimental results and Sect. 5 concludes the paper.

2 Hidden Markov Random Field (HMRF)

Let \(S=\{s_1,s_2,\dots ,s_{M}\}\) be the sites, pixels or positions set. Both image to segment and segmented image are formed of M sites. Each site \(s \in S\) has a neighborhood set \( V_s (S) \) (see an example in Fig. 1).

Fig. 1.
figure 1

An example of a lattice S, n is the set of sites neighboring s.

A neighborhood system V(S) has the following properties:

$$\begin{aligned} \left\{ \begin{array}{l} \forall s \in S, s \notin V_s(S) \\ \forall \{s, t\} \in S, s \in V_t(S) \Leftrightarrow t \in V_s(S) \end{array} \right. \end{aligned}$$
(1)

A r-order neighborhood system \( V^r(S) \) is defined by the following formula:

$$\begin{aligned} V_s^r(S) = \{ t \in S |\ \text{ distance }(s,t)^2 \le r \wedge s \ne t \} \end{aligned}$$
(2)

where \(\text{ distance }(s,t)\) is the Euclidean distance between pixels s and t. This distance depends only on the pixel position i.e., it is not related to the pixel value (see examples in Fig. 2). For volumetric data sets, as slices acquired by scanners, a 3D neighborhood system is used.

Fig. 2.
figure 2

First, second, fourth and fifth order neighborhood system of the site s.

A clique c is a subset of S where all sites are neighbors to each other. For a non single-site clique, we have:

$$\begin{aligned} \forall \{s, t\} \in c, s \ne t \Rightarrow (t \in V_s(S) \wedge s \in V_t(S)) \end{aligned}$$
(3)

A p-order clique noted \( C_p \) contains p sites i.e. p is the cardinal of the clique (see an example in Fig. 3).

Fig. 3.
figure 3

Cliques associated to the second order neighborhood system for the site s.

Let \(y=({y}_{1},{y}_{2},{\dots },{y}_{M })\) be the pixels values of the image to segment and \(x=({x}_{1},{x}_{2},{\dots },{x}_{M })\) be the pixels classes of the segmented image. \(y_i\) and \(x_i\) are respectively pixel value and class of the site \(s_i\). The image to segment y and the segmented image x are seen respectively as a realization of Markov Random families \(Y=({Y}_{1},{Y}_{2},{\dots },{Y}_{M })\) and \(X=({X}_{1},{X}_{2},{\dots },{X}_{M })\). The families of Random variables \( \{Y_s\}_{s \in S} \) and \( \{X_s\}_{s \in S} \) take their values respectively in the gray level space \( E_{y}=\{0,\ldots ,255\} \) and the discrete space \(E_x=\left\{ 1,{\dots },K\right\} \). K is the number of classes or homogeneous regions in the image. Configurations set of the image to segment y and the segmented image are respectively \(\varOmega _y=E_{y}^M\) and \(\varOmega _x=E_{x}^M\). Figure 4 shows an example of segmentation into three classes.

Fig. 4.
figure 4

An example of segmentation using FSL tool.

The segmentation of the image y consists of looking for a realization x of X. HMRF models this problem by maximizing the probability \(P\left[ X=x \;|\; {Y}=y\right] \).

$$\begin{aligned} {x}^*=\mathop {\text {arg}}\limits _{x \in \varOmega _x}{\text {max}}\left\{ P[X=x\;|\;{Y}=y]\right\} \end{aligned}$$
(4)
$$\begin{aligned} \left\{ \begin{array}{l} P[X=x|Y=y] = \text{ A } \exp \left( {-\varPsi (x,y)}\right) \\ \varPsi (x,y) = \sum \nolimits _{ s \in S }{ \left[ \ln (\sigma _{x_s}) + \frac{(y_s - \mu _{x_s})^2}{ 2 \sigma _{x_s}^2} \right] } + \frac{B}{T} \sum \nolimits _{c_2 = \{s,t\}}{ ( 1 - 2 \delta (x_s,x_t) ) }\\ \text{ A } \text{ is } \text{ a } \text{ positive } \text{ constant } \end{array} \right. \end{aligned}$$

where B is a constant, T is a control parameter called temperature, \( \delta \) is a Kronecker’s delta and \( \mu _{x_s} \), \( \sigma _{x_s} \) are respectively the mean and standard deviation of the class \(x_s\). When \(B>0\), the most likely segmentation corresponds to the constitution of large homogeneous regions. The size of these regions is controlled by the B value.

Maximizing the probability \( P[X=x\;|\;Y=y] \) is equivalent to minimizing the function \( \varPsi (x,y) \).

$$\begin{aligned} {x}^*=\mathop {\text {arg}}\limits _{x \in \varOmega _x}{\text {min}}\left\{ \varPsi (x,y)\right\} \end{aligned}$$
(5)

The computation of the exact segmentation \({x}^*\) is practically impossible [12]. Therefore optimization techniques are necessary to compute an approximate solution \(\hat{{x}}\).

Let \(\mu =(\mu _1,\dots ,\mu _j,\dots ,\mu _K)\) be the means and \(\sigma =(\sigma _1,\dots ,\sigma _j,\dots ,\sigma _K)\) be the standard deviations of the K classes in the segmented image \(x=(x_1,\dots ,x_s,\dots ,x_M)\) i.e.,

$$\begin{aligned} {} \begin{array}{l} {\left\{ \begin{array}{ll} \mu _j={\frac{1}{|S_j|} \sum \nolimits _{s \in S_j} y_s}\\ \sigma _j=\sqrt{\frac{1}{|S_j|} \sum \nolimits _{s \in S_j} (y_s-\mu _j)^2}\\ S_j=\{s\ |\ x_s=j\} \end{array}\right. } \end{array} \end{aligned}$$
(6)

In our approach, we will minimize \(\varPsi (\mu )\) defined below instead of minimizing \(\varPsi (x,y)\). We can always compute x through \(\mu \) by classifying \(y_s\) into the nearest mean \(\mu _j\) i.e., \(x_s=j\) if the nearest mean to \(y_s\) is \(\mu _j\). Thus instead of looking for \(x^*\), we look for \(\mu ^*\). The configuration set of \(\mu \) is \(\varOmega _{\mu }=[0\dots 255]^K\).

$$\begin{aligned} \begin{array}{l} {\left\{ \begin{array}{ll} {\mu }^*=\mathop {\text {arg}}\nolimits _{\mu \in \varOmega _{\mu }}{\text {min}}\left\{ \varPsi (\mu )\right\} \\ \varPsi (\mu ) = \sum \nolimits _{ j = 1 }^{K} f(\mu _j)\\ f(\mu _j) = \sum \limits _{s \in S_j }{ [\ln (\sigma _j ) + \frac{ (y_s - \mu _j)^2 }{ 2 \sigma _j^2 }] } + \frac{B}{T} \sum \nolimits _{c_2 = \{s,t\}}{ ( 1 - 2 \delta (x_s,x_t) ) } \end{array}\right. } \end{array} \end{aligned}$$
(7)

where \(S_j\), \(\mu _j\) and \(\sigma _j\) are defined in the Eq. (6).

To apply unconstrained optimization techniques, we redefine the function \(\varPsi (\mu )\) for \(\mu \in \mathbb R^K\) instead of \(\mu \in [0 \dots 255]^K\) as recommended by [4]. Therefore, the new function \(\varPsi (\mu )\) becomes as follows:

$$\begin{aligned} \varPsi (\mu ) =\sum _{ j = 1 }^{K} F(\mu _j)\ \ \text{ where } \mu _j \in \mathbb R\end{aligned}$$
(8)
$$\begin{aligned} F(\mu _j)= {\left\{ \begin{array}{ll} f(0)-u_j*10^3 &{} \text{ if } \mu _j < 0\\ f(\mu _j) &{} \text{ if } \mu _j \in [0\dots 255]\\ f(255)+(u_j-255)*10^3 &{} \text{ if } \mu _j > 255 \end{array}\right. } \end{aligned}$$
(9)

3 Hidden Markov Random Field and Conjugate Gradient algorithm (HMRF-CG)

To solve the minimization problem expressed in Eq. 7, we used the nonlinear conjugate gradient method. This latter generalizes the conjugate gradient method to nonlinear optimization. The summary of the algorithm is set out below.

Let \(\mu ^0\) be the initial point and \(d^0=-\varPsi ^{'}(\mu ^{0})\) be the first direction search.

Calculate the step size \(\alpha ^k\) that minimizes \(\varphi _k(\alpha )\). It is found by ensuring that the gradient is orthogonal to the search direction \(d^k\).

$$\begin{aligned} \varphi _k(\alpha )=\varPsi (\mu ^k+\alpha d^k) \end{aligned}$$
(10)

At the iteration \(k + 1\), calculate \(\mu ^{k+1}\) as follows:

$$\begin{aligned} \mu ^{k+1}=\mu ^{k}+\alpha ^kd^k \end{aligned}$$
(11)

Calculate the residual or the steepest direction:

$$\begin{aligned} r^{k+1}=-\varPsi ^{'}(\mu ^{k+1}) \end{aligned}$$
(12)

Calculate the search direction \(d^{k+1}\) as follows:

$$\begin{aligned} d^{k+1}=r^{k+1}+\beta ^{k+1} d^k \end{aligned}$$
(13)

In conjugate gradient method there are many variants to compute \(\beta ^{k+1}\), for example:

  1. 1.

    The Fletcher-Reeves conjugate gradient method:

    $$\begin{aligned} \beta ^{k+1}= \frac{\left( r^{k+1}\right) ^T r^{k+1}}{\left( r^{k}\right) ^T r^{k}} \end{aligned}$$
    (14)
  2. 2.

    The Polak-Ribière conjugate gradient method:

    $$\begin{aligned} \beta ^{k+1}= \max \left\{ \frac{\left( r^{k+1}\right) ^T \left( r^{k+1}-r^{k}\right) }{\left( r^{k}\right) ^T r^{k}}, 0 \right\} \end{aligned}$$
    (15)

To use conjugate gradient algorithm, we need the first derivative \(\varPsi ^{'}(\mu )=(\varDelta _1,\dots ,\varDelta _i,\dots ,\varDelta _K)\). Since no mathematical expression is available, it is approximated with finite differences [10]. In our tests, we have used a centered difference approximation to compute the first derivative as follows:

$$\begin{aligned} \varDelta _i=\frac{\varPsi (\mu _1,\dots ,\mu _i+\varepsilon ,\dots ,\mu _n)-\varPsi (\mu _1,\dots ,\mu _i-\varepsilon ,\dots ,\mu _n)}{2\varepsilon } \end{aligned}$$
(16)

The good approximation of the first derivative relies on the choice of the value of the parameter \(\varepsilon \). Through the tests conducted, we have selected 0.01 as the best value. In practice, the application is implemented in the cross-platform Qt creator (C++) under Linux system. We have used the GNU Scientific Library implementation of Polak-Ribière conjugate gradient method [13, 32]. The HMRF-CG method is summarized hereafter.

  • Input:

    y the image to segment, K the number of classes, B the constant parameter of HMRF, T the control parameter of HMRF, \(\mu ^0\) the initial point, \(\varepsilon \) the parameter used by the first derivative.

  • Initialization:

    we define s as the minimizer structure of gsl_multimin_fdfminimizer type and we initialize it by: K the size problem, \(\varPsi \) the function to minimize using the Eq. 8, \(\varPsi ^{'}\) the first derivative of the function to minimize using the Eq. 16, \(\mu ^0\) the start point, gsl\(\_\)multimin\(\_\)fdfminimizer\(\_\)conjugate\(\_\)pr (Polak-Ribière conjugate gradient) the minimizer function.

  • Iterations:

    we perform one iteration to update the state of the minimizer using the function gsl_multimin_fdfminimizer_iterate(s) and after that, we test s for convergence.

  • The stopping criterion:

    in our case, the minimization procedure should stop when the norm of the gradient (\(||\varPsi ^{'}||\)) is less than \(10^{-3}\).

  • Output:

    an approximation \(\hat{\mu }\) of \({\mu }^* \in \mathbb R^n\), \(\hat{x}\) the segmented image using \(\hat{\mu }\).

4 Experimental Results

In this section, we show the effectiveness of HMRF-CG method. To this end, we will make a comparison with some methods that are: improved k-means and MRF-ACO-Gossiping [41]. Next, we will show, the robustness of HMRF-CG method against noise, by doing a comparison with FAST FSL(FMRIBs Automated Segmentation Tool) and LGMM (Local Gaussian Mixture Model)[23]. To perform a fair and meaningful comparison, we have used a metric known as Dice Coefficient [9]. Morey et al. [27] used interchangeably Dice coefficient and Percentage volume overlap. This metric is usable only when the ground truth segmentation is known (see Sect. 4.1). The image sets and related parameters are described in Sect. 4.2. Finally, Sect. 4.3 is devoted to the yielded results.

4.1 Dice Coefficient Metric

Dice Coefficient (DC) measures how much the result is close to the ground truth. Let the resulting class be \(\hat{A}\) and its ground truth be \(A^*\). Dice Coefficient is given by the following formula:

$$\begin{aligned} DC = \frac{ 2 |\hat{A} \cap A^*| }{\ |\hat{A} \cup A^*| } \end{aligned}$$
(17)
Table 1. Related parameters to images used in our tests.
Table 2. Mean DC values (the best results are given in bold type).
Table 3. Mean DC values (the best results are in bold type).

4.2 The Image Sets and Related Parameters

To evaluate the quality of segmentation, we use four volumetric (3D) MR images, one obtained from IBSR (real image) and the others from BrainWeb (simulated images). Three components were considered: GM (Grey Matter), WM (White Matter) and CSF (Cerebro Spinal Fluid). IBSR image has the following characteristics: dimension is \(256\times 256\times 63\), with voxel\({}=1\times 3\times 1\) mm and T1-weighted modality. The three BrainWeb image sets BrainWeb1, BrainWeb2 and BrainWeb3 have the following characteristics: dimensions are \(181\times 217\times 181\), with voxels\({}=1\times 1\times 1\) mm and T1-weighted modality. They have different levels of noise and intensity non-uniformity that are respectively: (\(0\%\),\(0\%\)), (\(3\%\),\(20\%\)) and (\(5\%\),\(20\%\)). In this paper we have retained a subset of slices, which are cited in [41]. The IBSR slices retained are: 1-24/18, 1-24/20, 1-24/24, 1-24/26, 1-24/30, 1-24/32 and 1-24/34. The BrainWeb slices retained are: 85, 88, 90, 95, 97, 100, 104, 106, 110, 121 and 130.

Table 1 defines some parameters necessary to execute HMRF-CG method.

Fig. 5.
figure 5

(a) Slices number, (b) slices to segment from IBSR, (c) ground truth slices, (d) segmented slices using HMRF-CG.

Fig. 6.
figure 6

The slices number #95 of BrainWeb images. (a) Noise and intensity non-uniformity, (b) slices to segment from BrainWeb images, (c) segmented slices using HMRF-CG.

4.3 Results

Table 2 shows the mean DC values using IBSR image. The parameters used by HMRF-CG are described in Table 1. The parameters used by the other methods are given in [41].

Table 3 shows the mean DC values using BrainWeb images. The parameters used by HMRF-CG are described in Table 1. The parameters used by the LGMM method are given in LGMM [23]. The implementation of LGMM is built upon the segmentation method [2] of SPM 8 (Statistical Parametric MappingFootnote 3), which is a well known software for MRI analysis. As reported by [23], LGMM has better results than SPM 8.

Figure 5 shows a sample of slices to segment obtained from IBSR image, their ground truths and their segmentation using HMRF-CG method.

Figure 6 shows the slices number #95 with different noise and intensity non-uniformity from BrainWeb images and their segmentation using HMRF-CG.

5 Discussion and Conclusion

In this paper, we have described a method which combines Hidden Markov Random Field (HMRF) and Conjugate Gradient (GC). The tests have been carried out on samples obtained from IBSR and BrainWeb images, the most commonly used images in the field. For a fair and meaningful comparison of methods, the segmentation quality is measured using the Dice Coefficient metric. The results depend on the choice of parameters. This very sensitive task has been conducted by performing numerous tests. From the results obtained, the HMRF-GC method outperforms the methods tested that are: LGMM, Classical MRF, MRF-ACO-Gossiping and MRF-ACO. Tests permit to find good parameters for HMRF-CG to achieve good segmentation results. To further improve performances a preprocessing step can be added to reduce noise and inhomogeneity using appropriate filters.