Abstract
Canonical correlation analysis (CCA) is a widely used statistical technique to capture correlations between two sets of multivariate random variables and has found a multitude of applications in computer vision, medical imaging, and machine learning. The classical formulation assumes that the data live in a pair of vector spaces which makes its use in certain important scientific domains problematic. For instance, the set of symmetric positive definite matrices (SPD), rotations, and probability distributions all belong to certain curved Riemannian manifolds where vector-space operations are in general not applicable. Analyzing the space of such data via the classical versions of inference models is suboptimal. Using the space of SPD matrices as a concrete example, we present a principled generalization of the well known CCA to the Riemannian setting. Our CCA algorithm operates on the product Riemannian manifold representing SPD matrix-valued fields to identify meaningful correlations. As a proof of principle, we present experimental results on a neuroimaging data set to show the applicability of these ideas.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Afsari B (2011) Riemannian l p center of mass: existence, uniqueness, and convexity. Proc Am Math Soc 139(2):655–673
Akaho A (2001) A kernel method for canonical correlation analysis. In: International meeting on psychometric society
Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: ICML
Avants BB, Gee JC (2004) Geodesic estimation for large deformation anatomical shape and intensity averaging. NeuroImage 23:S139–150
Avants BB, Epstein CL, Grossman M, Gee JC (2008) Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain. Med Image Anal 12:26–41
Avants BB, Cook PA, Ungar L, Gee JC, Grossman M (2010) Dementia induces correlated reductions in white matter integrity and cortical thickness: a multivariate neuroimaging study with SCCA. NeuroImage 50(3):1004–1016
Avants BB, Libon DJ, Rascovsky K, Boller A, McMillan CT, Massimo L, Coslett H, Chatterjee A, Gross RG, Grossman M (2014) Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population. NeuroImage 84(1):698–711
Bach FR, Jordan MI (2003) Kernel independent component analysis. J Mach Learn Res 3:1–48
Basser PJ, Mattiello J, LeBihan D (1994) MR diffusion tensor spectroscopy and imaging. Biophys J 66(1):259
Bhatia R (2009) Positive definite matrices. Princeton University Press, Princeton
Callaghan PT (1991) Principles of nuclear magnetic resonance microscopy. Oxford University Press, Oxford
Chung MK, Worsley KJ, Paus T, Cherif C, Collins DL, Giedd JN, Rapoport JL, Evans AC (2001) A unified statistical approach to deformation-based morphometry. NeuroImage 14(3):595–606
Cook PA, Bai Y, Nedjati-Gilani S, Seunarine KK, Hall MG, Parker GJ, Alexander DC (2006) Camino: open-source diffusion-MRI reconstruction and processing. In: ISMRM, pp 2759
Do Carmo MP (1992) Riemannian geometry. Birkhäuser, Boston
Ferreira R, Xavier J, Costeira JP, Barroso V (2006) Newton method for Riemannian centroid computation in naturally reductive homogeneous spaces. In: ICASSP
Fletcher PT (2013) Geodesic regression and the theory of least squares on riemannian manifolds. Int J Comput Vis 105(2):171–185
Fletcher PT, Lu C, Pizer SM, Joshi S (2004) Principal geodesic analysis for the study of nonlinear statistics of shape. Med. Imaging 23(8):995–1005
Garrido L, Furl N, Draganski B, Weiskopf N, Stevens J, Chern-Yee Tan G, Driver J, Dolan RJ, Duchaine B (2009) Voxel-based morphometry reveals reduced grey matter volume in the temporal cortex of developmental prosopagnosics. Brain 132(12):3443–3455
Goh A, Lenglet C, Thompson PM, Vidal R (2009) A nonparametric Riemannian framework for processing high angular resolution diffusion images (HARDI). In: CVPR, pp 2496–2503
Hardoon DR et al (2007) Unsupervised analysis of fMRI data using kernel canonical correlation. NeuroImage 37(4):1250–1259
Hardoon DR, Szedmak S, Shawe-Taylor J (2004) CCA: an overview with application to learning methods. Neural Comput 16(12):2639–2664
Hinkle J, Fletcher PT, Joshi S (2014) Intrinsic polynomials for regression on Riemannian manifolds. J Math Imaging Vis 50:1–21
Ho J, Xie Y, Vemuri B (2013) On a nonlinear generalization of sparse coding and dictionary learning. In: ICML, pp 1480–1488
Ho J, Cheng G, Salehian H, Vemuri B (2013) Recursive Karcher expectation estimators and geometric law of large numbers. In: Proceedings of the Sixteenth international conference on artificial intelligence and statistics, pp 325–332
Hotelling H (1936) Relations between two sets of variates. Biometrika 28(3/4):321–377
Hsieh WW (2000) Nonlinear canonical correlation analysis by neural networks. Neural Netw 13(10):1095–1105
Hua X, Leow AD, Parikshak N, Lee S, Chiang MC, Toga AW, Jack CR Jr, Weiner MW, Thompson PM (2008) Tensor-based morphometry as a neuroimaging biomarker for Alzheimer’s disease: an MRI study of 676 AD, MCI, and normal subjects. NeuroImage 43(3):458–469
Hua X, Gutman B et al (2011) Accurate measurement of brain changes in longitudinal MRI scans using tensor-based morphometry. NeuroImage 57(1):5–14
Huang H, He H, Fan X, Zhang J (2010) Super-resolution of human face image using canonical correlation analysis. Pattern Recogn 43(7):2532–2543
Huckemann S, Hotz T, Munk A (2010) Intrinsic shape analysis: geodesic PCA for Riemannian manifolds modulo isometric Lie group actions. Stat Sin 20:1–100
Jayasumana S, Hartley R, Salzmann M, Li H, Harandi M (2013) Kernel methods on the Riemannian manifold of symmetric positive definite matrices. In: CVPR, pp 73–80
Karcher H (1977) Riemannian center of mass and mollifier smoothing. Commun Pure Appl Math 30(5):509–541
Kim HJ, Adluru N, Collins MD, Chung MK, Bendlin BB, Johnson SC, Davidson RJ, Singh V (2014) Multivariate general linear models (MGLM) on Riemannian manifolds with applications to statistical analysis of diffusion weighted images. In: CVPR
Kim T-K, Cipolla R (2009) CCA of video volume tensors for action categorization and detection. PAMI 31(8):1415–1428
Klein A, Andersson J, Ardekani B, Ashburner J, Avants BB, Chiang M, Christensen G, Collins L, Hellier P, Song J, Jenkinson M, Lepage C, Rueckert D, Thompson P, Vercauteren T, Woods R, Mann J, Parsey R (2009) Evaluation of 14 nonlinear deformation algorithms applied to human brain MRI registration. NeuroImage 46:786–802
Lai PL, Fyfe C (1999) A neural implementation of canonical correlation analysis. Neural Netw 12(10):1391–1397
Lebanon G (2005) Riemannian geometry and statistical machine learning. PhD thesis, Carnegie Mellon University
Li P, Wang Q, Zuo W, Zhang L (2013) Log-Euclidean kernels for sparse representation and dictionary learning. In: ICCV, pp 1601–1608
Moakher M (2005) A differential geometric approach to the geometric mean of symmetric positive-definite matrices. SIAM J Matrix Anal Appl 26(3):735–747
Mostow GD (1973) Strong rigidity of locally symmetric spaces, vol 78. Princeton University Press, Princeton
Niethammer M, Huang Y, Vialard F (2011) Geodesic regression for image time-series. In: MICCAI, pp 655–662
Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn. Springer, New York
Rapcsák T (1991) Geodesic convexity in nonlinear optimization. J Opt Theory Appl 69(1):169–183
Said S, Courty N, Le Bihan N, Sangwine SJ et al (2007) Exact principal geodesic analysis for data on SO(3). In: Proceedings of the 15th European signal processing conference, pp 1700–1705
Shi X, Styner M, Lieberman J, Ibrahim JG, Lin W, Zhu H (2009) Intrinsic regression models for manifold-valued data. In: MICCAI, pp 192–199
Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TE, Johansen-Berg H, Bannister PR, De Luca M, Drobnjak I, Flitney DE, Niazy RK, Saunders J, Vickers J, Zhang Y, De Stefano N, Brady JM, Matthews PM (2004) Advances in functional and structural MR image analysis and implementation as FSL. NeuroImage 23:208–219
Sommer S (2013) Horizontal dimensionality reduction and iterated frame bundle development. In: Geometric science of information. Springer, Heidelberg, pp 76–83
Sommer S, Lauze F, Nielsen M (2014) Optimization over geodesics for exact principal geodesic analysis. Adv Comput Math 40(2):283–313
Steinke F, Hein M, Schölkopf B (2010) Nonparametric regression between general Riemannian manifolds. SIAM J Imaging Sci 3(3):527–563
Taniguchi H et al (1984) A note on the differential of the exponential map and jacobi fields in a symmetric space. Tokyo J Math 7(1):177–181
Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3):515–534
Xie Y, Vemuri BC, Ho J (2010) Statistical analysis of tensor fields. In: MICCAI, pp 682–689
Yger F, Berar M, Gasso G, Rakotomamonjy A (2012) Adaptive canonical correlation analysis based on matrix manifolds. In: ICML
Yu S, Tan T, Huang K, Jia K, Wu X (2009) A study on gait-based gender classification. IEEE Trans Image Process 18(8):1905–1910
Zhang H, Yushkevich PA, Alexander DC, Gee JC (2006) Deformable registration of diffusion tensor MR images with explicit orientation optimization. Med Image Anal 10:764–785
Zhang Y, Brady M, Smith S (2001) Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans Med Imging 20(1):45–57
Zhu H, Chen Y, Ibrahim JG, Li Y, Hall C, Lin W (2009) Intrinsic regression models for positive-definite matrices with applications to diffusion tensor imaging. J Am Stat Assoc 104(487):1203–1212
Acknowledgements
This work was supported in part by NIH grants AG040396 (VS), AG037639 (BBB), AG021155 (SCJ), AG027161 (SCJ), NS066340 (BCV), NSF CAREER award 1252725 (VS). Partial support was also provided by UW ADRC (AG033514), UW ICTR, Waisman Core grant (P30 HD003352), and the Center for Predictive Computational Phenotyping (CPCP) at UW-Madison (AI117924). The contents do not represent views of the Dept. of Veterans Affairs or the United States Government.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1
The iterative method Algorithm 1 for Riemannian CCA with exact projection needs first and second derivatives of g in (4.13). We provide more details here.
1.1 First Derivative of g for SPD
Given SPD(n), the gradient of g with respect to t is obtained by the following proposition in [39].
Proposition 4.1.
Let F(t) be a real matrix-valued function of the real variable t. We assume that, for all t in its domain, F(t) is an invertible matrix which does not have eigenvalues on the closed negative real line. Then,
The derivation of \(\frac{d} {dt_{i}}g(t_{i},\boldsymbol{w}_{x})\) proceeds as
where \(S(t_{i}) = \text{Exp}(\mu _{x},t_{i}W_{x}) =\mu _{ x}^{1/2}\mathop{ \mathrm{exp}}\nolimits ^{t_{i}A}\mu _{x}^{1/2}\) and \(A =\mu _{ x}^{-1/2}W_{x}\mu _{x}^{-1/2}\).
In our formulation, \(F(t) = X_{i}^{-1}S(t_{i})\). Then, we have \(F(t)^{-1} = S(t_{i})^{-1}X_{i}\) and \(\frac{d} {dt}F(t) = X_{i}^{-1}\dot{S}(t_{ i})\). Hence, the derivative of g with respect to t i is given by
where \(\dot{S}(t_{i}) =\mu _{ x}^{1/2}A\mathop{\mathrm{exp}}\nolimits ^{t_{i}A}\mu _{x}^{1/2}\).
1.2 Numerical Expression for the Second Derivative of g
Riemannian CCA with exact projection can be optimized by Algorithm 1. Observe that the objective function of the proposed augmented Lagrangian method \(\mathcal{L}_{A}\) includes the term ∇g in (4.13). The gradient of \(\mathcal{L}_{A}\) involves the second derivative of g. More precisely, we need \(\frac{d^{2}} {dwdt}g\) and \(\frac{d^{2}} {dt^{2}} g\). These can be estimated by a finite difference method
Obviously, \(\frac{d^{2}} {dt^{2}} g\) can be obtained by the expression above using the analytical first derivative \(\frac{d} {dt}g\). For \(\frac{d^{2}} {dwdt}g\), we use the orthonormal basis in \(T_{\mu _{x}}\mathcal{M}\) to approximate the derivative. By definition of directional derivative, we have
where \(x \in \mathcal{X}\), d is dimension of \(\mathcal{X}\), and {u i } is orthonormal basis of \(\mathcal{X}\). Hence, perturbation along the orthonormal basis enables us to approximate the gradient. For example, on SPD(n) manifolds, the orthonormal basis in arbitrary tangent space \(T_{p}\mathcal{M}\) can be obtained by following three steps:
- Step a) :
-
Pick an orthonormal basis {e i } of R n(n+1)∕2,
- Step b) :
-
Convert {e i } into n-by-n symmetric matrices {u i } in \(T_{I}\mathcal{M}\), i.e., {u i } = mat({e i }),
- Step c) :
-
Transform basis {u i } from \(T_{I}\mathcal{M}\) to \(T_{p}\mathcal{M}\).
Appendix 2
MR Image Acquisition and Processing All the MRI data were acquired on a GE 3.0 Tesla scanner Discovery MR750 MRI system with an 8-channel head coil and parallel imaging (ASSET). The DWI data were acquired using a diffusion-weighted, spin-echo, single-shot, echo planar imaging radiofrequency (RF) pulse sequence with diffusion weighting in 40 noncollinear directions at b = 1300 s/mm2 in addition to 8 b = 0 (nondiffusion-weighted or T2-weighted) images. The cerebrum was covered using contiguous 2.5-mm-thick axial slices, FOV = 24 cm, TR = 8000 ms, TE = 67. 8 ms, \(\mathrm{matrix} = 96 \times 96\), resulting in isotropic 2.5 mm3 voxels. High-order shimming was performed prior to the DTI acquisition to optimize the homogeneity of the magnetic field across the brain and to minimize EPI distortions. The brain region was extracted using the first b = 0 image as input to the brain extraction tool (BET), also part of the FSL software.
Eddy current-related distortion and head motion of each data set were corrected using FSL software package [46]. The b-vectors were rotated using the rotation component of the transformation matrices obtained from the correction process. Geometric distortion from the inhomogeneous magnetic field applied was corrected with the b = 0 field map and PRELUDE (phase region expanding labeler for unwrapping discrete estimates) and FUGUE (FMRIBs utility for geometrically unwarping EPIs) from FSL. Twenty-nine subjects did not have field maps acquired during their imaging session. Because these participants did not differ on APOE4 genotype, sex, or age compared to the participants who had field map correction, they were included in order to enhance the final sample size. The diffusion tensors were then estimated from the corrected DWI data using nonlinear least squares estimation using the Camino library [13].
Individual maps were registered to a population-specific template constructed using diffusion tensor imaging toolkit (DTI-TKFootnote 1), which is an optimized DTI spatial normalization and atlas construction tool that has been shown to perform superior registration compared to scalar-based registration methods [55]. The template is constructed in an unbiased way that captures both the average diffusion features (e.g., diffusivities and anisotropy) and anatomical shape features (tract size) in the population. A subset of 80 diffusion tensor maps was used to create a common space template. All diffusion tensor maps were normalized to the template with first rigid followed by affine and then symmetric diffeomorphic transformations. The diffeomorphic coordinate deformations themselves are smooth and invertible, that is, neuroanatomical neighbors remain neighbors under the mapping. At the same time, the algorithms used to create these deformations are symmetric in that they are not biased towards the reference space chosen to compute the mappings. Moreover, these topology-preserving maps capture the large deformation necessary to aggregate populations of images in a common space. The spatially normalized data were interpolated to 2×2×2 mm voxels for the final CCA analysis.
Along with the DWI data the T1-weighted images were acquired using BRAVO pulse sequence which uses 3D inversion recovery (IR) prepared fast spoiled gradient recalled echo (FSPGR) acquisition to produce isotropic images at 1×1×1 mm resolution. We extract the brain regions again using BET. We compute an optimal template space, i.e., a population-specific, unbiased average shape and appearance image derived from our population [4]. We use the openly available advanced normalization tools (ANTSFootnote 2) to develop our template space and also perform the registration of the individual subjects to that space [5]. ANTS encodes current best practice in image registration, optimal template construction, and segmentation [35]. Once we perform the registrations we extract the Jacobian matrices (J) per voxel per subject from these deformation fields and compute the Cauchy deformation tensors (\(\sqrt{J^{T } J}\)) for performing CCA analysis.
Additional Experiments As described in the main text (Sect. 4.6.3), we examine the following multimodal linear relations but this time by performing CCA on the entire gray and white matter structures rather than just the hippocampi and cinguli:
To enable this analysis, the gray matter region was defined as follows: First, we performed a three tissue segmentation of each of the spatially normalized T1-weighted images into gray, white, and cerebral spinal fluid using FAST segmentation algorithm [56] implemented in FSL. Then, we take the average of the gray matter probabilities using all the subjects and threshold it to obtain the final binary mask resulting in \(\sim\) 700,000 voxels. The white matter region is simply defined as the region with fractional anisotropy (FA) obtained from the diffusion tensors > 0. 2 which resulted in about 50,000 voxels. We would like to contrast this with the region-specific analyses where the number of voxels was much smaller. To address this in our CCA we imposed an L 1-norm penalty to the weight vectors (similar to Euclidean CCA [51]) in our CCA objective function with a tuning parameter \(\lambda\).
In addition to the above linear relationships, CCA can also facilitate testing the following relationships by using the weight-vectors as substructures of interest and taking the average mean diffusivity (\(\overline{MD}\)) and average volumetric change (\(\overline{\log \vert J\vert }\)) in those structures as the outcome measures:
where the null hypotheses to be tested are β 1 = β 2 and \(\beta _{1}' =\beta _{2}'\). Similarly, we can find statistically significant AgeGroup differences by examining the following two models:
The scatter and bar plots for the testing linear relationships using both Riemannian and Euclidean CCA are shown in Figs. 4.7 and 4.8.
We now present a comprehensive set of montages of all the slices of the brain overlaid by the weight vectors obtained from CCA from Figs. 4.9, 4.10, 4.11, and 4.12.
We can observe that the voxels with nonzero weights (highlighted in red-boxes) are spatially complimentary in DTI and T1W. Even more interestingly, CCA finds the cingulum regions in the white matter for the DTI modality. In our experiments, we observe the same regions for various settings of a sparsity parameter (used heuristically to obtain more interpretable regions). Similarly, Figs. 4.11 and 4.12 show the results using the Euclidean CCA performed using the full tensor information [51]. We can observe that the results are similar to those in Figs. 4.9 and 4.10 but the regions are thinner in the Euclidean version.
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Kim, H.J., Adluru, N., Bendlin, B.B., Johnson, S.C., Vemuri, B.C., Singh, V. (2016). Canonical Correlation Analysis on SPD(n) Manifolds. In: Turaga, P., Srivastava, A. (eds) Riemannian Computing in Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-319-22957-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-22957-7_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22956-0
Online ISBN: 978-3-319-22957-7
eBook Packages: EngineeringEngineering (R0)