Abstract
Machine learning classifiers are frequently trained on heterogeneous multi-modal imaging data, where some patients have missing modalities. We address the problem of synthesising arterial spin labelling magnetic resonance imaging (ASL-MRI) - derived cerebral blood flow (CBF) - features in a heterogeneous data set. We synthesise ASL-MRI features using T1-weighted structural MRI (sMRI) and carotid ultrasound flow features. To deal with heterogeneous data, we extend the kernel partial least squares regression (kPLSR) - method to the case where both input and output data have partial coverage. The utility of the synthetic CBF features is tested on a binary classification problem of mild cognitive impairment patients vs. controls. Classifiers based on sMRI and synthetic ASL-MRI features are combined using a maximum probability rule, achieving a balanced accuracy of 92% (sensitivity 100 %, specificity 80 %) in a separate validation set. Comparison is made against support vector machine-classifiers from literature.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
Arterial spin-labelling magnetic resonance imaging (ASL-MRI) is a non-invasive blood flow imaging modality that can improve the diagnosis of Alzheimer’s disease (AD) by providing estimates of cerebral blood flow (CBF) and identifying regions of chronic cerebral hypoperfusion in individuals. However, ASL-MRI is still not part of clinical routine; many research databases used to train dementia classifiers, such as ADNI (Alzheimer’s Disease Neuroimaging Initiative), do not include ASL-MRI as part of their data collection protocol.
Image synthesis refers to the simulation of missing image modalities with machine learning algorithms using image modalities that are available. Synthetic image modalities can in some cases provide additional predictive value beyond the original data used for synthesis [7]. We use kernel partial least squares regression (kPLSR) for synthesising ASL-MRI-based CBF maps using structural MRI (sMRI) features and carotid ultrasound flow measurements as regressors. Using partial volumes of cortical and sub-cortical regions as features allows the relation between cerebral volume loss and reduction in CBF in dementia patients to be learned by the model. The synthetic CBF maps are used to generate CBF maps of patients for whom no ASL-MRI images are available (CBF imputation).
The challenge of building models using multi-modal data is that multiple cohorts may be required to achieve enough coverage and some cohorts will have some modalities missing. We refer to this as the heterogeneous data problem. To address it, we modify the NIPALS algorithm for training the kPLSR model to work on heterogeneous data, where some features are missing in part of the input data X, and some of the output data Y are also missing. The synthetic CBF maps are utilised as classification features in discriminating mild cognitive impairment-patients (MCIs) from cognitively healthy controls (CHCs).
The MCI vs. CHC-problem is less studied than the AD vs. CHC-problem (see e.g. the review [1]) because sMRI-derived partial volume-features are less informative in the prodromal stage of AD. We apply a simultaneous feature selection and classification strategy based on: (i) use of regional CBF values averaged over anatomical subregions (instead of voxelwise values), and (ii) elastic net regression. The proposed classifiers are compared to MCI vs. CHC-classifiers from literature using different imaging modalities as features.
2 Methods
2.1 Acquisition and Pre-processing of Imaging Data
Data from two clinical centres and three different cohorts were included to increase the number of cases available for training models (Table 1). The combined data set was heterogeneous with respect to operator, MR field strength, and modalities available for each case. Three different sets of features were used.
sMRI Features: T1-weighted sMRI were acquired and volumes of 141 cortical and sub-cortical regions were computed by propagating anatomical labels with the geodesic information flows-algorithm [2]. These features encapsulated grey matter (GM) atrophy but did not contain direct information about CBF.
Carotid Flow Features: Carotid ultrasound measurements were performed in two of the cohorts. Flow velocity signals were extracted from DICOM images and used to compute the mean flow rate and flow pulsatility indices (for both ICA-L and ICA-R separately), for a total of four features. These features encapsulated the baseline total CBF, but did not contain region-specific effects.
ASL Features: Pseudo-continuous ASL-MRI parameters were: TR/TE, 4,000 ms/14 ms; flip angle, 40\(^\circ \); FOV, 240 mm \(\times \) 240 mm; matrix size, 80 \(\times \) 80; 17 slices; thickness, 7 mm; labelling duration, 1.65 s; post-labelling delay, 1.525 s; and labelling gap, 20 mm. The ASL-MRI CBF maps were registered against the sMRI using SPM12 and equipped with maximum probability tissue labels defined on the MNI152 atlas, provided by Neuromorphometrics, Inc. from data collected in the OASIS project (http://www.oasis-brains.org/), and CBF was estimated in 144 anatomical regions. Regional CBF values were normalised for age and sex by using the w-scores method of [3] to obtain the final ASL features:
where \(\text {CBF} = \beta _0 + \text {gender} \cdot \beta _1 + \text {age} \cdot \beta _2 + \varepsilon \) is a linear model trained separately for all anatomical regions on the sub-population of cognitively healthy controls.
2.2 Feature Synthesis of the Regional CBF Maps
Given a matrix of inputs \( X \in \mathbb {R}^{n \times p}\) and a matrix of outputs \( Y \in \mathbb {R}^{n \times m}\), partial least squares-regression (PLSR) attempts to find a lower-dimensional representation of the input-output map using only \(\ell \ll p\) latent variables. This is achieved by simultaneous approximate decomposition of the two matrices as:
where \( T , U \in \mathbb {R}^{n \times \ell }\) are the loading matrices for \( X \) and \( Y \) respectively, and the scores \( P \in \mathbb {R}^{p \times \ell }\), \( Q \in \mathbb {R}^{m \times \ell }\) maximise the covariance Cov\(( T ^T X , U ^T Y )\). The feature space for synthesising ASL-MRI maps consisted of 3 demographic features (age, weight, height), 141 sMRI features, and 4 carotid flow-features. Thus the maximum number of input features was \(p=148\). A total of \(n=249\) cases were available for learning a model to predict the \(m=137\) CBF features.
As the relation between CBF and partial volumes of cortical sub-regions in the brain was likely nonlinear, the kernel version of PLSR [6] was used. In this approach, the feature samples \( X \) are mapped using a nonlinear map, \(\varPsi ( X )\), and then the standard linear PLSR is performed in the mapped feature space \(\left( \varPsi ( X ), Y \right) \). The NIPALS algorithm [8] can be formulated in such a way that only inner products of the type \(\varPsi ( x _i)^T \varPsi ( x _j)\), for \(i,j=1,\ldots ,n\) are required. These can then be obtained using the kernel trick as: \(K(x_i,x_j) = \varPsi ( x _i)^T \varPsi ( x _j)\).
In the case of partially missing input data, we divided the features into two parts \(X = [ X_1 \, X_2 ]\), where \(X_1\) contained the features that are present for all samples, and \(X_2 = \emptyset \) whenever the remaining features were missing in the sample X. We then defined the modified kernel function:
i.e. in the case of partially missing features the kernel function operated only on the subset of available features. Similarly, we divided the output matrix as \(Y = [Y_1; Y_2]\) such that \(Y_2 = \emptyset \) for all of the cases where the output data was missing, and defined the matrix \(S \in \mathbb {R}^{n \times n_1}\) as having ones on the diagonal and zero otherwise. It was used to extend the outputs \(Y_1\) from the restricted space to the full space, \(SY_1\). The rest of the NIPALS algorithm remained the same, as shown in Algorithm 1. The kPLSR estimator \(\widehat{ X }^{\text {CBF}}_n\) for the CBF features in the nth patient, learned from the demographic variables (\( X _{\text {train}}^{\text {demo}}\)), sMRI features (\( X _{\text {train}}^{\text {sMRI}}\)), and carotid flow features (\( X _{\text {train}}^{\text {carotid}}\)), was then given by the formula:
where \(\mathcal {X}_{(\cdot )} = [X_{(\cdot )}^\text {demo} X_{(\cdot )}^\text {sMRI} X_{(\cdot )}^\text {carotid}]\) was the combined feature vector. Figure 1 represents the workflow for extracting features, training a kPLSR model for CBF feature synthesis, and training a MCI vs. CHC binary classifier.
2.3 Simultaneous Classification and Feature Selection
As the amount of available training data was modest and pre-selected anatomical regions were used instead of voxelwise CBF values, standard elastic net regression (ENR) - classifier techniques were used to train three different classifier:
-
(i)
In Model A, the sMRI features \( X ^{\text {sMRI}}\) were used to train an ENR-model:
$$\begin{aligned} \min _{\beta _0, \beta } \left\{ \frac{1}{2N} \sum _{n=1}^N \left( Y_{n} - \beta _0 - X _{n}^{\text {sMRI}} \beta \right) ^2 + \lambda R( \beta ;\alpha ) \right\} \end{aligned}$$(5)with the elastic net regularisation term defined as \( R( \beta ;\alpha ) := \frac{(1-\alpha )}{2} \Vert \beta \Vert _2^2 + \alpha \Vert \beta \Vert _1. \) Here \(Y_{n} \in \mathbb {R}^{}\) is the binary MCI diagnosis for the n’th patient, \( X _{n}^{\text {sMRI}}\) denotes the sMRI features, \(\beta _0\) is the model intercept, and \( \beta \) are the regression weights. The continuous model prediction \(\widehat{ Y }^A = \beta _0 + X ^{\text {sMRI}} \beta \) was thresholded to a binary prediction to obtain the standard ROC-curve. The hyperparameters \(\lambda >0\) and \(\alpha \in (0,1]\) were chosen to maximise the area under the ROC-curve.
-
(ii)
In Model B, the synthesised CBF-features were used to train the model:
$$\begin{aligned} \min _{\beta _0, \beta } \left\{ \frac{1}{2N} \sum _{n=1}^N \left( Y_{n} - \beta _0 - \widehat{ X }_{n}^{\text {CBF}} \beta \right) ^2 + \lambda R( \beta ;\alpha ) \right\} , \end{aligned}$$(6)where \(\widehat{ X }_{n}^{\text {CBF}}\) is the kPLSR estimator (5). In order to measure the effect of using synthetic vs. ASL-MRI-derived CBF values, Model B was trained using two different sets of data. In one case, when CBF features were missing we simply used synthetic CBF features in their place (MRI + synthetic). In another case, only the synthetic CBF features were used even if ASL-MRI was available (synthetic only). Again the continuous probability was thresholded to a binary prediction.
-
(iii)
In Model C, the feature selection was performed simultaneously on both sMRI and CBF features:
$$\begin{aligned} \min _{\beta _0, \beta _1, \beta _2} \left\{ \frac{1}{2N} \sum _{n=1}^N \left( Y_{n} - \beta _0 - \widehat{ X }_{n}^{\text {CBF}} \beta _1 - X _{n}^{\text {sMRI}} \beta _2 \right) ^2 + \lambda R([ \beta _1; \beta _2];\alpha ) \right\} . \end{aligned}$$(7) -
(iv)
In Model D, we combined Models A and B by using the maximum probability rule, \(\widehat{Y}_{n}^C = \max \{ \widehat{Y}_{n}^A, \widehat{Y}_{n}^B \}\). The rationale for this was that a combination of two diagnostic tests with high specificity but lower sensitivity (typical for AD classifiers) may provide more sensitive diagnostic tests while avoiding the problem that simultaneous feature selection favours one set of features over the other. Models C and D were likewise trained using both synthetic CBF features alone and by combining ASL-MRI and synthetic CBF features.
3 Experiments
3.1 Synthetic CBF vs. ASL-MRI Reconstructed CBF
The Gaussian kernel, \(K(x_1,x_2) = \exp (-\Vert x_1 - x_2 \Vert _2^2 / d)\), was used in the kPLSR model. This resulted in two model hyperparameters, the kernel width d and the number of latent variables \(\ell \), that had to be tuned using leave-one-out cross-validation. Only cases where the sMRI features were available (\(n=156\)) were used in model training and cross-validation. Out of these, carotid ultrasound and ASL-MRI features were present in 100 and 55 cases, respectively. Hyperparameter values optimising the \(R^2\)-statistic were found to be \(d=35\) and \(\ell =2\). Possible bias and standard deviation of the synthesised CBF from ground truth w-score values were measured using a Bland-Altman - plot of w-scores averaged across all regions, separately for the white matter (WM) and gray matter (GM), see Fig. 2. The mean bias was \(\varDelta w = 0.20\) (\(p<0.001\)). The w-score is normalised so that its standard deviation in the normal population equals 1. The kPLS regressor slightly overestimated CBF in both the WM and GM.
3.2 Utility of Synthetic ASL in the CBF Imputation Problem
The MCI classifiers using Models A, B, C, and D were trained with four-fold cross-validation (4-FCV) in a training set of \(n=123\) cases. An additional randomly selected validation set of \(n=20\) cases not included in the training was used to evaluate the balanced accuracy (ACC), sensitivity (SENS), and specificity (SPEC) of each classifier using hyperparameters and cut-offs obtained in 4-FCV. A heat map of the 46 features chosen by Model B is shown in Fig. 3.
We compared our MCI classification accuracy to results reviewed in [1] with the following selection criteria: (i) the MCI vs. CHC classification problem was addressed, (ii) the feature set consisted of sMRI, ASL, or PET features, (iii) the cohort size was at least 100, and (iv) studies that used CSF biomarkers or neurocognitive test scores as features were excluded. The study with the best reported accuracy for each feature set was chosen as representative.
Results of the comparison are given in Table 2. Models A and B alone produced similar results in terms of accuracy, although Model B achieved better accuracy than was reported for ASL-MRI features in [3]. Model C improved the results slightly when MRI+synthetic CBF features were used, but the best results were obtained with Model D regardless or whether MRI+synthetic or synthetic only CBF features were used.
4 Discussion
Kernel PLS regression on heterogeneous data was used for the robust synthesis of regional CBF values in cases where no ASL-MRI images were available. As was reported in [3], CBF features alone were not particularly informative in MCI classification, but a multi-modal classifier using synthetic CBF features outperformed pure sMRI-based classifiers in a validation test. Best classifier performance (balanced accuracy 92%, sensitivity 100%, specificity 80%) was achieved when a maximum probability-rule was used to combine classifiers using different feature sets. The benefit of our proposed method is that only basic sMRI features (partial volumes of subregions) were used and, as a result, synthetic CBF features can, therefore, be generated in large-scale brain imaging databases, such as ADNI, without the need for extensive feature computation. It is possible that more informative sMRI features, e.g. ventricular and/or hippocampal shape morphometrics, could increase the accuracy of the resultant classifiers. Provided more ASL-MRI data were available, the use of convolutional neural networks on voxelwise CBF values should also be investigated to eliminate the need for pre-selecting anatomical regions for analysis.
References
Arbabshirani, M., Plis, S., Sui, J., Calhoun, V.: Single subject prediction of brain disorders in neuroimaging: promises and pitfalls. NeuroImage 145, 137–165 (2017)
Cardoso, M., Modat, M., Wolz, R., Melbourne, A., Cash, D., Rueckert, D.: Geodesic information flows: spatially-variant graphs and their application to segmentation and fusion. IEEE Trans. Med. Imag. 34(9), 1976–1988 (2015)
Collij, L., et al.: Application of machine learning to arterial spin labeling in mild cognitive impairment and Alzheimer disease. Radiology 281(3), 865–875 (2016)
Liu, M., Zhang, D., Shen, D.: Hierarchical fusion of features and classifier decisions for Alzheimer’s disease diagnosis. Hum. Brain Mapp. 35(4), 1305–1319 (2014)
Ortiz, A., Munilla, J., Álvarez-Illán, I., Górriz, J., Ramírez, J.: Exploratory graphical models of functional and structural connectivity patterns for Alzheimer’s disease diagnosis. Front. Comput. Neurosci. 9, 132 (2015)
Rosipal, R., Trejo, L.J.: Kernel partial least squares regression in reproducing kernel Hilbert space. J. Mach. Learn. Res. 2, 97–123 (2001)
van Tulder, G., de Bruijne, M.: Why does synthesized data improve multi-sequence classification? In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 531–538. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24553-9_65
Wold, S., Geladi, P., Esbensen, K., Öhman, J.: Multi-way principal components and PLS analysis. J. Chemom. 1, 41–56 (1987)
Acknowledgements
This work was funded by the FP7 project VPH-DARE@IT “Virtual Physiological Human: DementiA Research Enabled by IT” (FP7-ICT-2011-5.2-601055).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Lassila, T., Faria, H.M., Sarrami-Foroushani, A., Meneghello, F., Venneri, A., Frangi, A.F. (2018). Multi-modal Synthesis of ASL-MRI Features with KPLS Regression on Heterogeneous Data. In: Frangi, A., Schnabel, J., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. MICCAI 2018. Lecture Notes in Computer Science(), vol 11072. Springer, Cham. https://doi.org/10.1007/978-3-030-00931-1_54
Download citation
DOI: https://doi.org/10.1007/978-3-030-00931-1_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00930-4
Online ISBN: 978-3-030-00931-1
eBook Packages: Computer ScienceComputer Science (R0)