Abstract
In this paper we develop an efficient optimization algorithm for solving canonical correlation analysis (CCA) with complex structured-sparsity-inducing penalties, including overlapping-group-lasso penalty and network-based fusion penalty. We apply the proposed algorithm to an important genome-wide association study problem, eQTL mapping. We show that, with the efficient optimization algorithm, one can easily incorporate rich structural information among genes into the sparse CCA framework, which improves the interpretability of the results obtained. Our optimization algorithm is based on a general excessive gap optimization framework and can scale up to millions of variables. We demonstrate the effectiveness of our algorithm on both simulated and real eQTL datasets.
Similar content being viewed by others
References
Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272
Berriz G, Beaver J, Cenik C, Tasan M, Roth F (2009) Next generation software for functional trend analysis. Bioinformatics 25(22):3043–3044
Bindea G et al. (2009) Cluego: a cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25(8):1091–1093
Borwein J, Lewis AS (2000) Convex analysis and nonlinear optimization: theory and examples. Springer, Berlin
Brem RB, Krulyak L (2005) The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc Natl Acad Sci 102(5):1572–1577
Cao KL, Pascal M, Robert-Cranie C, Philippe B (2009) Sparse canonical methods for biological data integration: application to a cross-platform study. Bioinformatics 10
Chen X, Lin Q, Kim S, Carbonell J, Xing E (2011) Smoothing proximal gradient method for general structured sparse learning. In: Uncertainty in artificial intelligence
Duchi J, Singer Y (2009) Efficient online and batch learning using forward backward splitting. J Mach Learn Res 10:2899–2934
Hiriart-Urruty JB, Lemarechal C (2001) Fundamentals of convex analysis. Springer, Berlin
Jacob L, Obozinski G, Vert JP (2009) Group lasso with overlap and graph lasso. In: ICML
Jenatton R, Audibert J, Bach F (2009) Structured variable selection with sparsity-inducing norms. Tech rep, INRIA
Jenatton R, Mairal J, Obozinski G, Bach F (2010) Proximal methods for sparse hierarchical dictionary learning. In: ICML
Kanehisa M, Goto S (2000) Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30
Kim S, Xing E (2009) Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet 5(8)
Kim S, Xing EP (2010) Tree-guided group lasso for multi-task regression with structured sparsity. In: ICML
Mairal J, Jenatton R, Obozinski G, Bach F (2010) Network flow algorithms for structured sparsity. In: NIPS
Mol D, Vito D, Rosasco L (2009) Elastic net regularization in learning theory. J Complex 25:201–230
Naylor M, Lin X, Weiss S, Raby B, Lange C (2010) Using canonical correlation analysis to discover genetic regulatory variants. PLoS One
Nesterov Y (2003) Excessive gap technique in non-smooth convex minimization. Tech rep, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE)
Nesterov Y (2003) Introductory lectures on convex optimization: a basic course. Kluwer Academic, Dordrecht
Nesterov Y (2005) Smooth minimization of non-smooth functions. Math Program 103(1):127–152
Parkhomenko E, Tritchler D, Beyene J (2009) Sparse canonical correlation analysis with application to genomic data integration. Stat Appl Genet Mol Biol 8:1–34
Shen X, Huang HC (2010) Grouping pursuit through a regularization solution surface. J Am Stat Assoc 105(490):727–739
Tibshirani R, Saunders M (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc B 67(1):91–108
Tütüncü RH, Toh KC, Todd MJ (2003) Solving semidefinite-quadratic-linear programs using sdpt3. Math Program 95:189–217
Waaijenborg S, de Witt Hamer PV, Zwinderman A (2008) Quantifying the association between gene expressions and DNA-markers by penalized canonical correlation analysis. Stat Appl Genet Mol Biol 7
Wen Z, Goldfarb D, Yin W (2009) Alternating direction augmented Lagrangian methods for semidefinite programming. Tech rep, Dept of IEOR, Columbia University
Witten D, Tibshirani R (2009) Extensions of sparse canonical correlation analysis with applications to genomic data. Stat Appl Genet Mol Biol 8(1):1–27
Witten D, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10:515–534
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc B 68:49–67
Zhao P, Rocha G, Yu B (2009) Grouped and hierarchical model selection through composite absolute penalties. Ann Stat 37(6A):3468–3497
Zhu J et al. (2008) Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat Genet 40:854–861
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67:301–320
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, X., Liu, H. An Efficient Optimization Algorithm for Structured Sparse CCA, with Applications to eQTL Mapping. Stat Biosci 4, 3–26 (2012). https://doi.org/10.1007/s12561-011-9048-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-011-9048-z