1 Introduction

Arterial spin-labelling magnetic resonance imaging (ASL-MRI) is a non-invasive blood flow imaging modality that can improve the diagnosis of Alzheimer’s disease (AD) by providing estimates of cerebral blood flow (CBF) and identifying regions of chronic cerebral hypoperfusion in individuals. However, ASL-MRI is still not part of clinical routine; many research databases used to train dementia classifiers, such as ADNI (Alzheimer’s Disease Neuroimaging Initiative), do not include ASL-MRI as part of their data collection protocol.

Image synthesis refers to the simulation of missing image modalities with machine learning algorithms using image modalities that are available. Synthetic image modalities can in some cases provide additional predictive value beyond the original data used for synthesis [7]. We use kernel partial least squares regression (kPLSR) for synthesising ASL-MRI-based CBF maps using structural MRI (sMRI) features and carotid ultrasound flow measurements as regressors. Using partial volumes of cortical and sub-cortical regions as features allows the relation between cerebral volume loss and reduction in CBF in dementia patients to be learned by the model. The synthetic CBF maps are used to generate CBF maps of patients for whom no ASL-MRI images are available (CBF imputation).

The challenge of building models using multi-modal data is that multiple cohorts may be required to achieve enough coverage and some cohorts will have some modalities missing. We refer to this as the heterogeneous data problem. To address it, we modify the NIPALS algorithm for training the kPLSR model to work on heterogeneous data, where some features are missing in part of the input data X, and some of the output data Y are also missing. The synthetic CBF maps are utilised as classification features in discriminating mild cognitive impairment-patients (MCIs) from cognitively healthy controls (CHCs).

The MCI vs. CHC-problem is less studied than the AD vs. CHC-problem (see e.g. the review [1]) because sMRI-derived partial volume-features are less informative in the prodromal stage of AD. We apply a simultaneous feature selection and classification strategy based on: (i) use of regional CBF values averaged over anatomical subregions (instead of voxelwise values), and (ii) elastic net regression. The proposed classifiers are compared to MCI vs. CHC-classifiers from literature using different imaging modalities as features.

2 Methods

2.1 Acquisition and Pre-processing of Imaging Data

Data from two clinical centres and three different cohorts were included to increase the number of cases available for training models (Table 1). The combined data set was heterogeneous with respect to operator, MR field strength, and modalities available for each case. Three different sets of features were used.

Fig. 1.
figure 1

Workflow for extracting features from heterogeneous medical imaging data and training a kPLSR model for CBF feature synthesis. An MCI classifier is trained with both synthetic and real ASL-MRI features for comparison.

sMRI Features: T1-weighted sMRI were acquired and volumes of 141 cortical and sub-cortical regions were computed by propagating anatomical labels with the geodesic information flows-algorithm [2]. These features encapsulated grey matter (GM) atrophy but did not contain direct information about CBF.

Carotid Flow Features: Carotid ultrasound measurements were performed in two of the cohorts. Flow velocity signals were extracted from DICOM images and used to compute the mean flow rate and flow pulsatility indices (for both ICA-L and ICA-R separately), for a total of four features. These features encapsulated the baseline total CBF, but did not contain region-specific effects.

ASL Features: Pseudo-continuous ASL-MRI parameters were: TR/TE, 4,000 ms/14 ms; flip angle, 40\(^\circ \); FOV, 240 mm \(\times \) 240 mm; matrix size, 80 \(\times \) 80; 17 slices; thickness, 7 mm; labelling duration, 1.65 s; post-labelling delay, 1.525 s; and labelling gap, 20 mm. The ASL-MRI CBF maps were registered against the sMRI using SPM12 and equipped with maximum probability tissue labels defined on the MNI152 atlas, provided by Neuromorphometrics, Inc. from data collected in the OASIS project (http://www.oasis-brains.org/), and CBF was estimated in 144 anatomical regions. Regional CBF values were normalised for age and sex by using the w-scores method of [3] to obtain the final ASL features:

$$\begin{aligned} w_{n} = \frac{\text {CBF}_n - \left( \beta _0 + \text {gender}_n \cdot \beta _1 + \text {age}_n \cdot \beta _2 \right) }{\text {SD of residuals}}, \end{aligned}$$
(1)

where \(\text {CBF} = \beta _0 + \text {gender} \cdot \beta _1 + \text {age} \cdot \beta _2 + \varepsilon \) is a linear model trained separately for all anatomical regions on the sub-population of cognitively healthy controls.

Table 1. Study cohorts contributing data to this work. AD cases not used in classification were included in the kPLSR-model training to increase coverage.

2.2 Feature Synthesis of the Regional CBF Maps

Given a matrix of inputs \( X \in \mathbb {R}^{n \times p}\) and a matrix of outputs \( Y \in \mathbb {R}^{n \times m}\), partial least squares-regression (PLSR) attempts to find a lower-dimensional representation of the input-output map using only \(\ell \ll p\) latent variables. This is achieved by simultaneous approximate decomposition of the two matrices as:

$$\begin{aligned} X \approx T P ^T, \quad Y \approx U Q ^T, \end{aligned}$$
(2)

where \( T , U \in \mathbb {R}^{n \times \ell }\) are the loading matrices for \( X \) and \( Y \) respectively, and the scores \( P \in \mathbb {R}^{p \times \ell }\), \( Q \in \mathbb {R}^{m \times \ell }\) maximise the covariance Cov\(( T ^T X , U ^T Y )\). The feature space for synthesising ASL-MRI maps consisted of 3 demographic features (age, weight, height), 141 sMRI features, and 4 carotid flow-features. Thus the maximum number of input features was \(p=148\). A total of \(n=249\) cases were available for learning a model to predict the \(m=137\) CBF features.

As the relation between CBF and partial volumes of cortical sub-regions in the brain was likely nonlinear, the kernel version of PLSR [6] was used. In this approach, the feature samples \( X \) are mapped using a nonlinear map, \(\varPsi ( X )\), and then the standard linear PLSR is performed in the mapped feature space \(\left( \varPsi ( X ), Y \right) \). The NIPALS algorithm [8] can be formulated in such a way that only inner products of the type \(\varPsi ( x _i)^T \varPsi ( x _j)\), for \(i,j=1,\ldots ,n\) are required. These can then be obtained using the kernel trick as: \(K(x_i,x_j) = \varPsi ( x _i)^T \varPsi ( x _j)\).

In the case of partially missing input data, we divided the features into two parts \(X = [ X_1 \, X_2 ]\), where \(X_1\) contained the features that are present for all samples, and \(X_2 = \emptyset \) whenever the remaining features were missing in the sample X. We then defined the modified kernel function:

$$\begin{aligned}{}[ \widetilde{K} ]_{i,j}(X,X) := \left\{ \begin{aligned} \varPsi ( x _{i,1})^T \varPsi ( x _{j,1}),&\quad \text { if } x _i^2= \emptyset \text { or } x _j^2 = \emptyset \\ \varPsi ([ x _{i,1} \, x _{i,2}])^T \varPsi ([ x _{j,1} \, x _{j,2}]),&\quad \text { otherwise } \end{aligned} \right. , \end{aligned}$$
(3)

i.e. in the case of partially missing features the kernel function operated only on the subset of available features. Similarly, we divided the output matrix as \(Y = [Y_1; Y_2]\) such that \(Y_2 = \emptyset \) for all of the cases where the output data was missing, and defined the matrix \(S \in \mathbb {R}^{n \times n_1}\) as having ones on the diagonal and zero otherwise. It was used to extend the outputs \(Y_1\) from the restricted space to the full space, \(SY_1\). The rest of the NIPALS algorithm remained the same, as shown in Algorithm 1. The kPLSR estimator \(\widehat{ X }^{\text {CBF}}_n\) for the CBF features in the nth patient, learned from the demographic variables (\( X _{\text {train}}^{\text {demo}}\)), sMRI features (\( X _{\text {train}}^{\text {sMRI}}\)), and carotid flow features (\( X _{\text {train}}^{\text {carotid}}\)), was then given by the formula:

$$\begin{aligned} \widehat{ X }_{n}^{\text {CBF}} = \widetilde{K} \left( \mathcal {X}_{n}; \mathcal {X}_{\text {train}} \right) B , \end{aligned}$$
(4)

where \(\mathcal {X}_{(\cdot )} = [X_{(\cdot )}^\text {demo} X_{(\cdot )}^\text {sMRI} X_{(\cdot )}^\text {carotid}]\) was the combined feature vector. Figure 1 represents the workflow for extracting features, training a kPLSR model for CBF feature synthesis, and training a MCI vs. CHC binary classifier.

figure a

2.3 Simultaneous Classification and Feature Selection

As the amount of available training data was modest and pre-selected anatomical regions were used instead of voxelwise CBF values, standard elastic net regression (ENR) - classifier techniques were used to train three different classifier:

  1. (i)

    In Model A, the sMRI features \( X ^{\text {sMRI}}\) were used to train an ENR-model:

    $$\begin{aligned} \min _{\beta _0, \beta } \left\{ \frac{1}{2N} \sum _{n=1}^N \left( Y_{n} - \beta _0 - X _{n}^{\text {sMRI}} \beta \right) ^2 + \lambda R( \beta ;\alpha ) \right\} \end{aligned}$$
    (5)

    with the elastic net regularisation term defined as \( R( \beta ;\alpha ) := \frac{(1-\alpha )}{2} \Vert \beta \Vert _2^2 + \alpha \Vert \beta \Vert _1. \) Here \(Y_{n} \in \mathbb {R}^{}\) is the binary MCI diagnosis for the n’th patient, \( X _{n}^{\text {sMRI}}\) denotes the sMRI features, \(\beta _0\) is the model intercept, and \( \beta \) are the regression weights. The continuous model prediction \(\widehat{ Y }^A = \beta _0 + X ^{\text {sMRI}} \beta \) was thresholded to a binary prediction to obtain the standard ROC-curve. The hyperparameters \(\lambda >0\) and \(\alpha \in (0,1]\) were chosen to maximise the area under the ROC-curve.

  2. (ii)

    In Model B, the synthesised CBF-features were used to train the model:

    $$\begin{aligned} \min _{\beta _0, \beta } \left\{ \frac{1}{2N} \sum _{n=1}^N \left( Y_{n} - \beta _0 - \widehat{ X }_{n}^{\text {CBF}} \beta \right) ^2 + \lambda R( \beta ;\alpha ) \right\} , \end{aligned}$$
    (6)

    where \(\widehat{ X }_{n}^{\text {CBF}}\) is the kPLSR estimator (5). In order to measure the effect of using synthetic vs. ASL-MRI-derived CBF values, Model B was trained using two different sets of data. In one case, when CBF features were missing we simply used synthetic CBF features in their place (MRI + synthetic). In another case, only the synthetic CBF features were used even if ASL-MRI was available (synthetic only). Again the continuous probability was thresholded to a binary prediction.

  3. (iii)

    In Model C, the feature selection was performed simultaneously on both sMRI and CBF features:

    $$\begin{aligned} \min _{\beta _0, \beta _1, \beta _2} \left\{ \frac{1}{2N} \sum _{n=1}^N \left( Y_{n} - \beta _0 - \widehat{ X }_{n}^{\text {CBF}} \beta _1 - X _{n}^{\text {sMRI}} \beta _2 \right) ^2 + \lambda R([ \beta _1; \beta _2];\alpha ) \right\} . \end{aligned}$$
    (7)
  4. (iv)

    In Model D, we combined Models A and B by using the maximum probability rule, \(\widehat{Y}_{n}^C = \max \{ \widehat{Y}_{n}^A, \widehat{Y}_{n}^B \}\). The rationale for this was that a combination of two diagnostic tests with high specificity but lower sensitivity (typical for AD classifiers) may provide more sensitive diagnostic tests while avoiding the problem that simultaneous feature selection favours one set of features over the other. Models C and D were likewise trained using both synthetic CBF features alone and by combining ASL-MRI and synthetic CBF features.

Fig. 2.
figure 2

Bland-Altman plot of kPLSR-modelled vs. ASL-MRI derived w-scores. Mean w-scores averaged over all WM/GM regions reported separately.

3 Experiments

3.1 Synthetic CBF vs. ASL-MRI Reconstructed CBF

The Gaussian kernel, \(K(x_1,x_2) = \exp (-\Vert x_1 - x_2 \Vert _2^2 / d)\), was used in the kPLSR model. This resulted in two model hyperparameters, the kernel width d and the number of latent variables \(\ell \), that had to be tuned using leave-one-out cross-validation. Only cases where the sMRI features were available (\(n=156\)) were used in model training and cross-validation. Out of these, carotid ultrasound and ASL-MRI features were present in 100 and 55 cases, respectively. Hyperparameter values optimising the \(R^2\)-statistic were found to be \(d=35\) and \(\ell =2\). Possible bias and standard deviation of the synthesised CBF from ground truth w-score values were measured using a Bland-Altman - plot of w-scores averaged across all regions, separately for the white matter (WM) and gray matter (GM), see Fig. 2. The mean bias was \(\varDelta w = 0.20\) (\(p<0.001\)). The w-score is normalised so that its standard deviation in the normal population equals 1. The kPLS regressor slightly overestimated CBF in both the WM and GM.

Table 2. Performance of the classifier in the CBF imputation problem (top half), compared with studies in the literature with at least 100 cases (bottom half). ENR = Elastic Net Regression, SVM = Support Vector Machine.

3.2 Utility of Synthetic ASL in the CBF Imputation Problem

The MCI classifiers using Models A, B, C, and D were trained with four-fold cross-validation (4-FCV) in a training set of \(n=123\) cases. An additional randomly selected validation set of \(n=20\) cases not included in the training was used to evaluate the balanced accuracy (ACC), sensitivity (SENS), and specificity (SPEC) of each classifier using hyperparameters and cut-offs obtained in 4-FCV. A heat map of the 46 features chosen by Model B is shown in Fig. 3.

We compared our MCI classification accuracy to results reviewed in [1] with the following selection criteria: (i) the MCI vs. CHC classification problem was addressed, (ii) the feature set consisted of sMRI, ASL, or PET features, (iii) the cohort size was at least 100, and (iv) studies that used CSF biomarkers or neurocognitive test scores as features were excluded. The study with the best reported accuracy for each feature set was chosen as representative.

Results of the comparison are given in Table 2. Models A and B alone produced similar results in terms of accuracy, although Model B achieved better accuracy than was reported for ASL-MRI features in [3]. Model C improved the results slightly when MRI+synthetic CBF features were used, but the best results were obtained with Model D regardless or whether MRI+synthetic or synthetic only CBF features were used.

Fig. 3.
figure 3

Heat map of the coefficients \(\beta \) for the CBF imputation problem. A total of 46 regions were chosen as features. Regions with largest coefficients identified.

4 Discussion

Kernel PLS regression on heterogeneous data was used for the robust synthesis of regional CBF values in cases where no ASL-MRI images were available. As was reported in [3], CBF features alone were not particularly informative in MCI classification, but a multi-modal classifier using synthetic CBF features outperformed pure sMRI-based classifiers in a validation test. Best classifier performance (balanced accuracy 92%, sensitivity 100%, specificity 80%) was achieved when a maximum probability-rule was used to combine classifiers using different feature sets. The benefit of our proposed method is that only basic sMRI features (partial volumes of subregions) were used and, as a result, synthetic CBF features can, therefore, be generated in large-scale brain imaging databases, such as ADNI, without the need for extensive feature computation. It is possible that more informative sMRI features, e.g. ventricular and/or hippocampal shape morphometrics, could increase the accuracy of the resultant classifiers. Provided more ASL-MRI data were available, the use of convolutional neural networks on voxelwise CBF values should also be investigated to eliminate the need for pre-selecting anatomical regions for analysis.