1 Introduction

Many medical image analysis methods rely on the hypothesis that corresponding anatomical structures present similar intensity profiles. Unlike computed tomography, magnetic resonance imaging does not produce scans in an absolute standard scale, in general. Even when using the same imaging protocols, there can be significant variation between different scanners. Acquisition parameters have a complex effect on the luminance of the acquired images, therefore a simple linear rescaling of intensities is usually insufficient for effective data harmonisation [5]. Therefore, a crucial factor for enabling the construction of large-scale image databases from multiple sites is accurate nonlinear intensity normalisation.

A number of different approaches have been introduced for this task (cf. [1]), the most widely-adopted of which is that of Nyúl et al. [7]. The authors proposed to normalise intensities by matching a set of histogram quantiles, using these as landmarks for a piecewise linear transformation. Despite its apparent simplicity, it has proven very effective in clinical applications [9].

Our proposed method, nonparametric density flows (NDFlow), is perhaps conceptually closest to [5], which involves matching Gaussian mixture models (GMMs) fitted to a pair of image histograms. The author used a finite mixture to represent a pre-defined set of five tissues classes, whereas we propose to use nonparametric mixtures, focusing on accurately modelling the density rather than discriminating tissue types, and sidestepping the problem of pre-selecting the number of components. A further difference to our work is that, instead of polynomially interpolating between the means of corresponding components, we build a smooth transformation model based on density flows.

2 Method

We begin by justifying and describing the density model used to represent the intensity distributions to be matched. We then introduce the chosen objective function with its gradients for optimisation. Finally, we present our flow-based transformation model, which deforms the data so it conforms to the matched density model. Note that we focus here on single-modality intensity normalisation, although the entire formulation below extends naturally to the multivariate case.

Fig. 1.
figure 1

Comparison of two MRI scans, before and after the proposed NDFlow normalisation. Right: histograms (shaded) and fitted mixture models (dotted: likelihood, solid: mixture components).

2.1 Intensity Model

In order to be able to match the intensity distributions of a pair of images, a suitable probability density model is required. Typically, finite mixture models are considered for this task [5, 8]. However, a well-known limitation of these is the requirement to specify a priori a fixed number of components, which may in addition call for an iterative model selection loop (e.g. [8]).

On the opposite end of the spectrum, another approach is to use kernel density estimation, which is widespread for shape registration (e.g. [4, 6]). However, this formulation would result in an unwieldy optimisation problem, involving thousands or millions of parameters and all pairwise interactions. Furthermore, the derived transformation would likely not be satisfactorily smooth without additional regularisation.

To overcome both issues we propose to use Dirichlet process Gaussian mixture models (DPGMMs) [3]. Instead of specifying a fixed number of components, they rely on a vague concentration parameter, which regulates the expected amount of clustering fragmentation and enables them to adapt their complexity to the data at hand. By allowing an unbounded number of components and setting a versatile prior on the mixture proportions, they appear as a parsimonious middle ground for flexibility and tractability.

We fit the DPGMMs to each image’s intensities using variational inference [2]. More specifically, we implemented an efficient weighted variant to fit a mixture directly to each 1D histogram.

2.2 Density Matching

The first step is to perform a coarse affine alignment by matching the moving density’s first and second moments to the target’s, accounting for arbitrary translation and rescaling of the values. This same affine transformation is then also applied to the data before the nonlinear warping takes place.

We quantify the disagreement between two probability density functions q and p on a probability space \(\mathcal {X}\) by means of the \(L^2\) divergence:

(1)

where is the \(L^2\) inner product and is its induced norm. Aside from being symmetric, this quantity is positive and reaches zero iff \(q \overset{\text {a.e.}}{=} p\). Crucially, unlike the usual Kullback–Leibler divergence, it is expressible in closed form for Gaussian mixture densities.

Let \(q = \sum _k \pi _k q_k\) and \(p = \sum _m \tau _m p_m\) denote two Gaussian mixtures, with components and . Equation (1) has tractable gradients w.r.t. the parameters of q, which we use to optimise its components’ means \(\{\mu _k\}_k\) and precisions \(\{\lambda _k\}_k\) (cf. extended version).Footnote 1

We have found, in practice, that it is largely unnecessary to adapt the mixing proportions, \(\{\pi _k\}_k\), to get an excellent agreement between mixture densities. In fact, changing the mixture weights would require transferring samples between mixture components. Although surely possible, we point out that in the context of histogram matching this would imply altering their semantic value (e.g. consider a mixture of two well-separated components representing different tissue types).

2.3 Warping

After matching one GMM to another, we also need a way to transform the data modelled by that GMM so it matches the target data. To this end, we draw inspiration from fluid mechanics and define the warping transformation, f, as the trajectories of particles under the effect of a velocity field u over time, taking the probability density q for the mechanical mass density. The key property that such flow must satisfy is conservation of mass: \(\partial _t q + \partial _x (q u) = 0\), where \(t \mapsto q^{(t)}\) is specified directly from the density matching.

Let us first consider the case of warping a single mixture component. A random variable \(x \sim \mathcal {N}(\mu _k, \lambda _k^{-1})\) can be expressed via a diffeomorphic reparametrisation of a standard Gaussian, with \(x = \psi _k(\epsilon ) = \mu _k + \epsilon / \sqrt{\lambda _k}\) and \(\epsilon \sim \mathcal {N}(0, 1)\). Assuming its mean and precision are changing with rates \(\dot{\mu }_k\) and \(\dot{\lambda }_k\), respectively, we can introduce a velocity field \(u_k = \dot{\psi }_k \circ \psi _k^{-1}\) for its samples so that they agree with this evolving density. The instantaneous velocity at ‘time’ t is thus given by

$$\begin{aligned} u_k^{(t)}(x) = \dot{\mu }_k^{(t)} - \frac{\dot{\lambda }_k^{(t)}}{2 \lambda _k^{(t)}} \big (x - \mu _k^{(t)}\big ). \end{aligned}$$
(2)

In the case of a mixture with constant weights \(\{\pi _k\}_k\), we can construct a smooth, mass-conserving global velocity field u as

$$\begin{aligned} u^{(t)}(x) = \sum _k \frac{\pi _k q_k^{(t)}(x)}{q^{(t)}(x)} \, u_k^{(t)}(x), \end{aligned}$$
(3)

which is simply a point-wise convex combination of each component’s velocity field, \(u_k\), weighted by the corresponding posterior assignment probabilities.

Finally, the warping transformation \(f^{(t)}\) is given by the solution to the following ordinary differential equation (ODE):

$$\begin{aligned} \partial _t f^{(t)}(x) = u^{(t)}(f^{(t)}(x)) \,, \quad f^{(0)}(x) = x. \end{aligned}$$
(4)

With f defined as above, we can prove that \(q^{(t)}\) is indeed the density of samples from \(q^{(0)}\) transformed through \(f^{(t)}\), i.e. \({q^{(0)} = |\partial _x f^{(t)}| \, q^{(t)} \circ f^{(t)}}\) (cf. extended version). Crucially, the true solution to Eq. (4) is diffeomorphic by construction, and can be numerically approximated (and inverted) with arbitrary precision. In particular, we employ the classic fourth-order Runge–Kutta ODE solver (RK4).

Now assume we obtain optimal parameter values \(\{\mu _k^*\}_k\) and \(\{\lambda _k^*\}_k\) after matching q to p. We can then warp the data using the above approach, for example linearly interpolating the intermediate parameter values, \({\mu _k^{(t)} = t \mu _k^* + (1-t) \mu _k^{(0)}}\) and \({\lambda _k^{(t)} = t \lambda _k^* + (1-t) \lambda _k^{(0)}}\), hence setting the rates in Eq. (2) to constant values, \({\dot{\mu }_k = \mu _k^* - \mu _k^{(0)}}\) and \({\dot{\lambda }_k = \lambda _k^* - \lambda _k^{(0)}}\), and integrating Eq. (4) for \(t \in [0, 1]\).

2.4 Practical Considerations

Since each medical image in a dataset can have millions of voxels, computing the posteriors and flows for every voxel individually can be too expensive for batch processing. To mitigate this issue, we can compute the end-to-end transformation on a mesh in the range of interest, which is then interpolated for the intensities in the entire volume. In the reported experiments, we have used a uniformly-spaced mesh of 200 points, which has proven accurate enough for normalisation purposes.

Note that the transformation could also be computed on the histogram of discrete intensity values and built into a look-up table. However, this would not scale well to two or more dimensions for multi-modal intensity normalisation, whereas a mesh would not need to be very fine nor require a regular grid layout.

3 Experiments

3.1 Dataset

Our experiments were run on 581 T1-weighted MRI scans from the IXI database, collected from three imaging centres with different scanners.Footnote 2 Each scan was bias field-corrected using SPM12Footnote 3 with default settings and rigidly registered to MNI space. SPM12 was further used to produce grey matter (GM), white matter (WM) and cerebrospinal fluid (CSF) tissue probability maps. We obtained brain masks by adding the three probability maps and thresholding at 0.5. The statistics reported below were weighted by the voxel-wise tissue probabilities to account for partial-volume effects and segmentation ambiguities.

3.2 Setup

We firstly fitted the nonparametric mixture models to the full integer-value histograms of the raw images (inside the brain masks), as described in Sect. 2.1. We set the DP’s concentration parameter to 2 and used data-driven Normal–Gamma priors for the components. As an ad-hoc post-processing step, we pruned the leftover mixture components with weights smaller than \(10^{-3}\). In the absence of one global reference distribution, we affinely aligned these DPGMMs and the corresponding data to zero mean and unit variance (cf. Fig. 2, middle).

Fig. 2.
figure 2

Population densities, colour-coded by imaging centre

After this rough alignment, global and centre-wise average densities were computed. These were then considered as histograms to which we fitted global and centre-wise reference DPGMMs.

For normalisation, we consider two scenarios. The first is to normalise each centre’s reference distribution to the global target, then to apply this same transformation to all subjects in that centre. In the other approach, each subject’s image is individually normalised to the global target density. These scenarios reflect different practical applications where the centre-wise normalisation aims to preserve intra-centre variation, which might be desired. On the other hand, the individual normalisation aims to make all scans as similar as possible.

We compare our technique to Nyúl et al.’s prevalent quantile-based, piecewise linear histogram matching method [7], considered state-of-the-art for intensity normalisation and referred here as Nyul. We acquired the default 11 landmarks (histogram deciles and upper/lower percentiles) from the affine-aligned data for all subjects, then normalised each subject to this set of average landmarks.

3.3 Results

Histogram Fitness. Fig. 3 illustrates the results of normalisation between the pair of images in Fig. 1, which have a notable dissimilarity in the CSF region of the histograms. We observe that both our NDFlow- and Nyul-transformed histograms present substantially lower mean absolute and root mean squared errors (MAE and RMSE) than the affine-aligned one, and our method performed best by a small margin. This is confirmed in a number of trials with other images.

Fig. 3.
figure 3

Histograms and Q–Q plots of each of the methods against the target histogram. The shading shows the discrepancy between the transformed (black) and target histogram (light red). In the rightmost plot, the landmarks are indicated by vertical lines in the histogram and ticks in the Q–Q plot.

A noteworthy artefact of Nyul are abrupt jumps produced at the landmark values (e.g. Fig. 3c), which appear because interval are uniformly compressed or dilated by different factors, and may be detrimental to downstream histogram-based tasks (e.g. mutual information registration). NDFlow causes no such discontinuities due to the smoothness of the mass-conserving flows.

Tissue Statistics. In Table 1 we report the WM, GM and CSF intensity statistics for different normalisations. Firstly, we see that the centre-wise normalisation had a small but significant effect on the overall distribution statistics. More importantly, the variances of the statistics after individual NDFlow and Nyul transformations were typically similar, and both were almost always substantially smaller than the variance after only affine alignment, with the exception of CSF.

Table 1. Tissue statistics after normalisation (mean ± std. dev., \(N = 581\))

It is known that the amount of intra-cranial fluid can vary substantially due to factors such as age and some neurodegenerative conditions, and this reflects on the distributions of intensities in brain MRI scans, which is evident in Fig. 2. As a result, normalising all subjects to a ‘mean’ distribution fails to identify a consistent reference range for CSF intensities.

A fundamental limitation of any histogram matching scheme is that it is unclear how to proceed when the distributions are genuinely different. Intensity distributions can be strongly affected by anatomical differences; for example, we can observe large variations in the amounts of fluid and fat in brain or whole-body scans, which may heavily skew the overall distributions (moderate example in Fig. 3). The underlying assumption of these methods (including ours) is that the distributions are similar enough up to an affine rescaling and a mild nonlinear deformation of the values, thus handling histograms of truly different shapes remains an open challenge. For images with different fields of view, it may be beneficial to perform image registration before applying intensity normalisation.

Centre Classification. To evaluate the effectiveness of intensity normalisation for data harmonisation, we conducted a centre discrimination experiment with random forest classifiers trained on the full images. We report the pooled test results from two-fold cross validation (detailed results in extended version).

Relative to affine normalisation, centre-wise and individual NDFlow and Nyul showed a slight drop in overall classification accuracy (94.1% vs. 92.7%, 93.6%, 92.9%, resp.). On the other hand, the uncertainty, as measured by the entropy of the predictions, was significantly higher (paired t-test, all \(p<.01\)). Nonlinear intensity normalisation therefore seems to successfully remove some of the biasing factors which are discriminative of the origin of the images.

4 Conclusion

In this paper, we have introduced a novel method for MRI intensity normalisation, called nonparametric density flows (NDFlow). It is based on fitting and matching Dirichlet process Gaussian mixture densities, by minimising their \(L^2\) divergence, and on mass-conserving flows, which ensure that the empirical intensity distribution agrees with the matched density model.

We demonstrated that our normalisation approach makes tissue intensity statistics significantly more consistent across subjects than a simple affine alignment, and compares favourably to the state-of-the-art method of Nyúl et al. [7]. We have additionally verified that NDFlow is able to accurately match histograms without introducing spurious artefacts produced by the competing method. Finally, we argued that both normalisation techniques can reduce some discriminative scanner biases, in a step toward effective data harmonisation.

By employing nonparametric mixture models, we are able to represent arbitrary histogram shapes with any number of modes. In addition, our formulation has the flexibility to match only part of the distributions, by freezing the parameters of some mixture components. This may be useful for ignoring lesion-related modes (e.g. multiple sclerosis hyperintensities), if the corresponding components can be identified (e.g., via anomaly detection). Evaluating this approach and its robustness against lesion load is a compelling direction for further research.