Generative Discriminative Models for Multivariate Inference and Statistical Mapping in Medical Imaging

Varol, Erdem; Sotiras, Aristeidis; Zeng, Ke; Davatzikos, Christos

doi:10.1007/978-3-030-00931-1_62

Erdem Varol¹⁸,
Aristeidis Sotiras¹⁸,
Ke Zeng¹⁸ &
…
Christos Davatzikos¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11072))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

8488 Accesses
6 Citations
1 Altmetric

Abstract

This paper presents a general framework for obtaining interpretable multivariate discriminative models that allow efficient statistical inference for neuroimage analysis. The framework, termed generative discriminative machine (GDM), augments discriminative models with a generative regularization term. We demonstrate that the proposed formulation can be optimized in closed form and in dual space, allowing efficient computation for high dimensional neuroimaging datasets. Furthermore, we provide an analytic estimation of the null distribution of the model parameters, which enables efficient statistical inference and p-value computation without the need for permutation testing. We compared the proposed method with both purely generative and discriminative learning methods in two large structural magnetic resonance imaging (sMRI) datasets of Alzheimer’s disease (AD) (n = 415) and Schizophrenia (n = 853). Using the AD dataset, we demonstrated the ability of GDM to robustly handle confounding variations. Using Schizophrenia dataset, we demonstrated the ability of GDM to handle multi-site studies. Taken together, the results underline the potential of the proposed approach for neuroimaging analyses.

You have full access to this open access chapter, Download conference paper PDF

A Multi-dimensional Joint ICA Model with Gaussian Copula

Functional logistic discrimination with sparse PCA and its application to the structural MRI

Article 22 March 2019

Grouping Total Variation and Sparsity: Statistical Learning with Segmenting Penalties

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Voxel-based analysis [1] of imaging data has enabled the detailed mapping of regionally specific effects, which are associated with either group differences or continuous non-imaging variables, without the need to define a priori regions of interest. This is achieved by adopting a generative model that aims to explain signal variations as a function of categorical or continuous variables of clinical interest. Such a model is easy to interpret. However, it does not fully exploit the available data since it ignores correlations between different brain regions [5].

Conversely, supervised multivariate pattern analysis methods take advantage of dependencies among image elements. Such methods typically adopt a discriminative setting to derive multivariate patterns that best distinguish the contrasted groups. This results in improved sensitivity and numerous approaches have been proposed to efficiently obtain meaningful multivariate brain patterns [4, 6, 7, 10, 13, 14]. However, such approaches suffer from certain limitations. Specifically, their high expressive power often results in overfitting due to modeling spurious distracter patterns in the data [8]. Confounding variations may thus limit the application of such models in multi-site studies [12] that are characterized by significant population or scanner differences, and at the same time hinder the interpretability of the models. This limitation is further emphasized by the lack of analytical techniques to estimate the null distribution of the model parameters, which makes statistical inference costly due to the requirement for permutation tests for most multivariate techniques.

Hybrid generative discriminative models have been proposed to improve the interpretability of discriminative models [2, 11]. However, these models also do not have analytically obtainable null distribution, which makes challenging the assessment of the statistical significance of their model parameters. Last but not least, their solution is often obtained through non-convex optimization schemes, which reduces reproducibility and out-of-sample prediction performance.

To tackle the aforementioned challenges, we propose a novel framework termed generative-discriminative machine (GDM), which aims to obtain a multivariate model that is both accurate in prediction and whose parameters are interpretable. GDM combines ridge regression [9] and ordinary least squares (OLS) regression to obtain a model that is both discriminative, while at the same time being able to reconstruct the imaging features using a low-rank approximation that involves the group information. Importantly, the proposed model admits a closed-form solution, which can be attained in dual space, reducing computational cost. The closed form solution of GDM further enables the analytic approximation of its null distribution, which makes statistical inference and p-value computation computationally efficient.

We validated the GDM framework on two large datasets. The first consists of Alzheimer’s disease (AD) patients (n = 415), while the second comprises Schizophrenia (SCZ) patients (n = 853). Using the AD dataset, we demonstrated the robustness of GDM under varying confounding scenarios. Using the SCZ dataset, we effectively demonstrated that GDM can handle multi-site data without overfitting to spurious patterns, while at the same time achieving advantageous discriminative performance.

2 Method

Generative Discriminative Machine: GDM aims to obtain a hybrid model that can both predict group differences and generate the underlying dataset. This is achieved by integrating a discriminative model (i.e., ridge regression [9]) along with a generative model (i.e., ordinary least squares regression (OLS)). Ridge and OLS are chosen because they can readily handle both classification and regression problems, while admitting a closed form solution.

Let $\varvec{X}\in \varvec{R}^{n\times d}$ denote the n by d matrix that contains the d dimensional imaging features of n independent subjects arranged row-wise. Likewise, let $\varvec{Y}\in \varvec{R}^{n}$ denote the vector that stores the clinical variables of the corresponding n subjects. GDM aims to relate the imaging features $\varvec{X}$ with the clinical variables $\varvec{Y}$ using the parameter vector $\varvec{J}\in \varvec{R}^{d}$ by optimizing the following objective:

$$\begin{aligned} \min _{\varvec{J}} \underbrace{\Vert \varvec{J}\Vert _2^2 + \lambda _1 \Vert \varvec{Y}- \varvec{X}\varvec{J}\Vert _2^2}_{\text {ridge discriminator}} + \underbrace{\lambda _2 \Vert \varvec{X}^T - \varvec{J}\varvec{Y}^T\Vert _2^2}_{\text {OLS generator}}. \end{aligned}$$

(1)

If we now take into account information from k additional covariates (e.g., age, sex or other clinical markers) stored in $\varvec{C}\in \varvec{R}^{n\times k}$, we obtain the following GDM objective:

$$\begin{aligned} \min _{\varvec{J},\varvec{W}_0,\varvec{A}_0} \underbrace{\Vert \varvec{J}\Vert _2^2 + \lambda _1 \Vert \varvec{Y}- \varvec{X}\varvec{J}- \varvec{C}\varvec{W}_0\Vert _2^2}_{\text {ridge discriminator}} + \underbrace{\lambda _2 \Vert \varvec{X}^T - \varvec{J}\varvec{Y}^T - \varvec{A}_0\varvec{C}^T\Vert _2^2}_{\text {OLS generator}}, \end{aligned}$$

(2)

where $\varvec{W}_0 \in \varvec{R}^{k}$ contains the bias terms and $\varvec{A}_0 \in \varvec{R}^{d\times k}$ the regression coefficients pertaining to their corresponding covariates. The inclusion of the bias terms in the ridge regression term allows us to preserve the direction of the parameter vector that imaging pattern that distinguishes between the groups, while at the same time achieving accurate subject-specific classification by taking into account each sample’s demographic and other information. Similarly, the inclusion of additional coefficients in the OLS term allows for reconstructing each sample by additionally taking into account its demographic or other information. Lastly, the hyperparameters $\lambda _1$ and $\lambda _2$ control the trade-off between discriminative and generative models, respectively.

Closed Form Solution: The formulation in Eq. 2 is optimized by the following closed form solution:

$$\begin{aligned}&\varvec{J}\nonumber = \left[ \varvec{I}+ \lambda _1 (\varvec{X}^T\varvec{X}-\varvec{X}^T\varvec{C}(\varvec{C}^T\varvec{C})^{-1}\varvec{C}^T\varvec{X}) +\lambda _2(\varvec{Y}^T\varvec{Y}-\varvec{Y}^T \varvec{C}(\varvec{C}^T\varvec{C})^{-1}\varvec{C}^T \varvec{Y})\right] ^{-1} \nonumber \\&\times \left[ (\lambda _1 + \lambda _2) (\varvec{X}^T \varvec{Y}- \varvec{X}^T\varvec{C}(\varvec{C}^T\varvec{C})^{-1}\varvec{C}^T \varvec{Y})\right] , \end{aligned}$$

(3)

which requires a $d \times d$ matrix inversion that can be costly in neuroimaging settings. To account for that, we solve Eq. 2 in the subject space using the following dual variables $\varvec{\varLambda }\in \varvec{R}^{n}$:

$$\begin{aligned} \varvec{\varLambda }= \varvec{M}^{-1}_{[1:n,1:n]}\bigg ( \varvec{I}+ \frac{\lambda _2 \varvec{X}\varvec{X}^T \varvec{C}(\varvec{C}^T\varvec{C})^{-1} \varvec{C}^T - \lambda _2 \varvec{X}\varvec{X}^T}{1+\lambda _2(\varvec{Y}^T\varvec{Y}- \varvec{Y}^T \varvec{C}(\varvec{C}^T\varvec{C})^{-1} \varvec{C}^T \varvec{Y})}\bigg )\varvec{Y}, \end{aligned}$$

(4)

where $\varvec{M}$ is the following $n+k \times n+k$ matrix:

$$\begin{aligned} \varvec{M}= \left[ \begin{matrix} -\frac{\varvec{X}\varvec{X}^T}{1+\lambda _2(\varvec{Y}^T\varvec{Y}- \varvec{Y}^T \varvec{C}(\varvec{C}^T\varvec{C})^{-1} \varvec{C}^T \varvec{Y})} - \varvec{I}/\lambda _1 &{} \varvec{C}\\ \varvec{C}^T &{} 0 \end{matrix} \right] . \end{aligned}$$

(5)

The dual variables $\varvec{\varLambda }$ can be used to solve $\varvec{J}$ using the following equation:

$$\begin{aligned} \varvec{J}&= \frac{\lambda _2 \varvec{X}^T \varvec{Y}- \lambda _2 \varvec{X}^T \varvec{C}(\varvec{C}^T\varvec{C})^{-1} \varvec{C}^T \varvec{Y}- \varvec{X}^T \varvec{\varLambda }}{1+\lambda _2 (\varvec{Y}^T\varvec{Y}- \varvec{Y}^T \varvec{C}(\varvec{C}^T\varvec{C})^{-1} \varvec{C}^T \varvec{Y})}. \end{aligned}$$

(6)

Analytic Approximation of Null Distribution: Using the dual formulation, the GDM parameters $\varvec{J}$ can be shown to be a linear combination of the group labels $\varvec{Y}$ and the following matrix $\mathbf {Q}$:

$$\begin{aligned} \mathbf {Q} = \frac{\lambda _2 \varvec{X}^T - \lambda _2 \varvec{X}^T \varvec{C}(\varvec{C}^T\varvec{C})^{-1} \varvec{C}^T-\varvec{X}^T \varvec{M}^{-1}_{[1:n,1:n]}\bigg ( \varvec{I}+ \frac{\lambda _2 \varvec{X}\varvec{X}^T \varvec{C}(\varvec{C}^T\varvec{C})^{-1} \varvec{C}^T - \lambda _2 \varvec{X}\varvec{X}^T}{1+\lambda _2(\varvec{Y}^T\varvec{Y}- \varvec{Y}^T \varvec{C}(\varvec{C}^T\varvec{C})^{-1} \varvec{C}^T \varvec{Y})}\bigg )}{1+\lambda _2( \varvec{Y}^T\varvec{Y}- \varvec{Y}^T \varvec{C}(\varvec{C}^T\varvec{C})^{-1} \varvec{C}^T \varvec{Y})},\nonumber \\ \end{aligned}$$

such that $\varvec{J}= \mathbf {Q}\varvec{Y}$ where $\mathbf {Q}$ is approximately invariant to permutation operations on $\varvec{Y}$. Assuming $\varvec{Y}$ is zero mean, unit variance yields that $\text {E}(J_i) = 0$ and $\text {Var}(J_i) = \sum _j Q_{i,j}^2$ under random permutations of $\varvec{Y}$ [15, 16]. Asymptotically this yields that $J_i \rightarrow \mathcal {N}(0,\sqrt{\sum _j Q_{i,j}^2)}$, which allows efficient statistical inference on the parameter values of $J_i$.

3 Experimental Validation

We compared GDM with a purely discriminative model, namely ridge regression [9], as well as with its generative counter-part, which was obtained through the procedure outlined by Haufe et al. [8]. We chose these methods because their simple form allows the computation of their null distribution, which in turns enables the comparison of the statistical significance of their parameter maps. The covariates (i.e. $\varvec{C}= [\text {age}~\text {sex}]$) were linearly residualized using the training set for ridge regression and its generative counterpart.

We used two large datasets in two different settings. First, we used a subset of the ADNI study, consisting of 228 controls (CN) and 187 Alzheimer’s disease (AD) patients, to evaluate out-of-sample prediction accuracy and reproducibility. Second, we used data from a multi-site Schizophrenia study, which consisted of 401 patients (SCZ) and 452 controls (CN) spanning three sites (USA n = 236, China n = 286, and Germany n = 331), to evaluate the cross-site prediction and reproducibility of each method.

For all datasets, T1-weighted MRI volumetric scans were obtained at 1.5 Tesla. The images were pre-processed through a pipeline consisting of (1) skull-stripping; (2) N3 bias correction; and (3) deformable mapping to a standardized template space. Following these steps, a low-level representation of the tissue volumes was extracted by automatically partitioning the MRI volumes of all participants into 151 volumetric regions of interest (ROI). The ROI segmentation was performed by applying a multi-atlas label fusion method. The derived ROIs were used as the input features for all methods.

Analytical Approximation of p-Values. To confirm that the analytical approximation of null distribution of GDM is correct, we estimated the p-values through the approximation technique as well as through permutation testing. A range of 10 to 10,000 permutations was applied to observe the error rate. This experiment was performed on the ADNI dataset. The results displayed in Fig. 1 demonstrate that the analytic approximation holds with approximately $O(1/\sqrt{\# \text {permutations}})$ error.

Out-of-sample Prediction and Reproducibility. To assess the discriminative performance and reproducibility of the compared methods under varying confounding scenarios, we used the ADNI dataset. We simulated four distinct training scenarios in increasing potential for confounding effects: $\bullet $ Case 1: $50\%$ AD + $50\%$ CN subjects, mean age balanced, $\bullet $ Case 2: $75\%$ CN + $25\%$ AD, mean age balanced, $\bullet $ Case 3: $50\%$ AD + $50\%$ CN, oldest ADs, youngest CNs, $\bullet $ Case 4: $75\%$ CN + $25\%$ AD, oldest ADs, youngest CNs.

All models had their respective parameters ($\lambda _1,\lambda _2 \in \lbrace 10^{-5},\ldots ,10^2 \rbrace $) cross-validated in an inner fold before performing out-of-sample prediction on a left out test set consisting of equal numbers of AD and CN subjects with balanced mean age. Furthermore, the inner product of training model parameters was compared between folds to assess the reproducibility of models. Training and testing folds were shuffled 100 times to yield a distribution.

The prediction accuracies and the model reproducibility for the above cases are shown in Fig. 2. The results demonstrate that while GDM is not a purely discriminative model, its predictions outperformed ridge regression in all four cases. Regarding reproducibility, the Haufe et al. [8] procedure yielded the most stable models since it yields a purely generative model. However, GDM was more reproducible than ridge regression.

Multi-site Study. To assess the predictive performance of the compared methods in a multi-site setting, we used the Schizophrenia dataset that comprises data from three sites. All models had their respective parameters cross-validated while training in one site before making predictions in the other two sites. Each training involved using $90\%$ of the site samples to allow for resampling the training sets 100 times to yield a distribution. The reproducibility across the resampled sets was measured using the inner product between model parameters. The multi-site prediction and reproducibility results are visualized in Fig. 3.

In five out of six cross-site prediction settings, GDM outperformed all compared methods in terms accuracy. Also, GDM had higher reproducibility than ridge regression, while having slightly lower reproducibility than the generative procedure in Haufe et al. [8].

Statistical Maps and p-Values. To qualitatively assess and explain the predictive performance of the compared methods for the AD vs. CN scenario, we computed the model parameter maps using full resolution gray matter tissue density maps for the ADNI dataset (Fig. 4 top). Furthermore, since the null distribution of GDM, as well as ridge regression, can be estimated analytically, we computed p-values for the model parameters and displayed the regions surviving false discovery rate (FDR) correction [3] at level $q<0.05$ (Fig. 4 bottom).

The statistical maps demonstrated that both GDM and Haufe procedure yield patterns that accurately delineate the regions associated with AD, namely the widespread atrophy present in the temporal lobe, amygdala, and hippocampus. This is in contrast with the patterns found in ridge regression that resemble a hard to interpret speckle pattern with meaningful weights only on hippocampus. This once again confirmed the tendency of purely discriminative models to capture spurious patterns. Furthermore, the p-value maps of the Haufe method and ridge regression demonstrate the wide difference between features selected by generative and discriminative methods and how GDM strikes a balance between the two to achieve superior predictive performance.

4 Discussion and Conclusion

The interpretable patterns captured by GDM coupled with its ability to outperform discriminative models in terms of prediction underline its potential for neuroimaging analysis. We demonstrated that GDM may obtain highly reproducible models through generative modeling, thus avoiding overfitting that is commonly observed in neuroimaging settings. Overfitting is especially evident in multi-site situations, where discriminative models might subtly model spurious dataset effects which might compromise prediction accuracy in an out-of-site setting. Furthermore, by using a formulation that yields a closed form solution, we additionally demonstrated that is possible to efficiently assess the statistical significance of the model parameters.

While the methodology presented herein is analogous to generatively regularizing ridge regression with ordinary least squares regression, the framework proposed can be generalized to include generative regularization in other commonly used discriminative learning methods. Namely, it is possible to augment linear discriminant analysis (LDA), support vector machine (SVM), artificial neural network (ANN) objective with a similar generative term to yield an alternative generative discriminative model of learning. However, the latter two cases would not permit a closed form solution, making it impossible to analytically estimate a null distribution.

References

Ashburner, J., Friston, K.J.: Voxel-based morphometry-the methods. Neuroimage 11(6), 805–821 (2000)
Article Google Scholar
Batmanghelich, N.K., et al.: Generative-discriminative basis learning for medical imaging. IEEE Trans. Med. Imaging 31(1), 51–69 (2012)
Article Google Scholar
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995)
MathSciNet MATH Google Scholar
Cuingnet, R., et al.: Spatial and anatomical regularization of SVM: a general framework for neuroimaging data. IEEE PAMI 35(3), 682–696 (2013)
Article Google Scholar
Davatzikos, C.: Why voxel-based morphometric analysis should be used with great caution when characterizing group differences. Neuroimage 23(1), 17–20 (2004)
Article Google Scholar
Ganz, M., et al.: Relevant feature set estimation with a knock-out strategy and random forests. Neuroimage 122, 131–148 (2015)
Article Google Scholar
Grosenick, L., et al.: Interpretable whole-brain prediction analysis with GraphNet. NeuroImage 72, 304–321 (2013)
Article Google Scholar
Haufe, S., et al.: On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage 87, 96–110 (2014)
Article Google Scholar
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Article Google Scholar
Kriegeskorte, N., Goebel, R., Bandettini, P.: Information-based functional brain mapping. PNAS 103(10), 3863–3868 (2006)
Article Google Scholar
Mairal, J., Bach, F., Ponce, J.: Task-driven dictionary learning. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 791–804 (2012)
Article Google Scholar
Rao, A., et al.: Predictive modelling using neuroimaging data in the presence of confounds. NeuroImage 150, 23–49 (2017)
Article Google Scholar
Rasmussen, P.M., et al.: Model sparsity and brain pattern interpretation of classification models in neuroimaging. Pattern Recognit. 45(6), 2085–2100 (2012)
Article Google Scholar
Sabuncu, M.R., Van Leemput, K.: The relevance voxel machine (RVoxM): a self-tuning bayesian model for informative image-based prediction. TMI 31, 2290–2306 (2012)
Google Scholar
Varol, E., et al.: MIDAS: regionally linear multivariate discriminative statistical mapping. NeuroImage 174, 111–126 (2018)
Article Google Scholar
Varol, E., et al.: Regionally discriminative multivariate statistical mapping. In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 1560–1563. IEEE (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA, 19104, USA
Erdem Varol, Aristeidis Sotiras, Ke Zeng & Christos Davatzikos

Authors

Erdem Varol
View author publications
You can also search for this author in PubMed Google Scholar
Aristeidis Sotiras
View author publications
You can also search for this author in PubMed Google Scholar
Ke Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Christos Davatzikos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erdem Varol .

Editor information

Editors and Affiliations

University of Leeds, Leeds, UK
Alejandro F. Frangi
King’s College London, London, UK
Julia A. Schnabel
University of Pennsylvania, Philadelphia, PA, USA
Christos Davatzikos
Universidad de Valladolid, Valladolid, Spain
Carlos Alberola-López
Queen’s University, Kingston, ON, Canada
Gabor Fichtinger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Varol, E., Sotiras, A., Zeng, K., Davatzikos, C. (2018). Generative Discriminative Models for Multivariate Inference and Statistical Mapping in Medical Imaging. In: Frangi, A., Schnabel, J., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. MICCAI 2018. Lecture Notes in Computer Science(), vol 11072. Springer, Cham. https://doi.org/10.1007/978-3-030-00931-1_62

Download citation

DOI: https://doi.org/10.1007/978-3-030-00931-1_62
Published: 13 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00930-4
Online ISBN: 978-3-030-00931-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Generative Discriminative Models for Multivariate Inference and Statistical Mapping in Medical Imaging

Abstract

Similar content being viewed by others

A Multi-dimensional Joint ICA Model with Gaussian Copula

Functional logistic discrimination with sparse PCA and its application to the structural MRI

Grouping Total Variation and Sparsity: Statistical Learning with Segmenting Penalties

Keywords

1 Introduction

2 Method

3 Experimental Validation

4 Discussion and Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Generative Discriminative Models for Multivariate Inference and Statistical Mapping in Medical Imaging

Abstract

Similar content being viewed by others

A Multi-dimensional Joint ICA Model with Gaussian Copula

Functional logistic discrimination with sparse PCA and its application to the structural MRI

Grouping Total Variation and Sparsity: Statistical Learning with Segmenting Penalties

Keywords

1 Introduction

2 Method

3 Experimental Validation

4 Discussion and Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation