Statistics in Biosciences

, Volume 4, Issue 1, pp 3–26 | Cite as

An Efficient Optimization Algorithm for Structured Sparse CCA, with Applications to eQTL Mapping



In this paper we develop an efficient optimization algorithm for solving canonical correlation analysis (CCA) with complex structured-sparsity-inducing penalties, including overlapping-group-lasso penalty and network-based fusion penalty. We apply the proposed algorithm to an important genome-wide association study problem, eQTL mapping. We show that, with the efficient optimization algorithm, one can easily incorporate rich structural information among genes into the sparse CCA framework, which improves the interpretability of the results obtained. Our optimization algorithm is based on a general excessive gap optimization framework and can scale up to millions of variables. We demonstrate the effectiveness of our algorithm on both simulated and real eQTL datasets.


Sparse CCA Structured sparsity Group structure Network structure Genome-wide association study eQTL mapping Optimization algorithm 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272 CrossRefGoogle Scholar
  2. 2.
    Berriz G, Beaver J, Cenik C, Tasan M, Roth F (2009) Next generation software for functional trend analysis. Bioinformatics 25(22):3043–3044 CrossRefGoogle Scholar
  3. 3.
    Bindea G et al. (2009) Cluego: a cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25(8):1091–1093 CrossRefGoogle Scholar
  4. 4.
    Borwein J, Lewis AS (2000) Convex analysis and nonlinear optimization: theory and examples. Springer, Berlin MATHGoogle Scholar
  5. 5.
    Brem RB, Krulyak L (2005) The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc Natl Acad Sci 102(5):1572–1577 CrossRefGoogle Scholar
  6. 6.
    Cao KL, Pascal M, Robert-Cranie C, Philippe B (2009) Sparse canonical methods for biological data integration: application to a cross-platform study. Bioinformatics 10 Google Scholar
  7. 7.
    Chen X, Lin Q, Kim S, Carbonell J, Xing E (2011) Smoothing proximal gradient method for general structured sparse learning. In: Uncertainty in artificial intelligence Google Scholar
  8. 8.
    Duchi J, Singer Y (2009) Efficient online and batch learning using forward backward splitting. J Mach Learn Res 10:2899–2934 MathSciNetMATHGoogle Scholar
  9. 9.
    Hiriart-Urruty JB, Lemarechal C (2001) Fundamentals of convex analysis. Springer, Berlin MATHCrossRefGoogle Scholar
  10. 10.
    Jacob L, Obozinski G, Vert JP (2009) Group lasso with overlap and graph lasso. In: ICML Google Scholar
  11. 11.
    Jenatton R, Audibert J, Bach F (2009) Structured variable selection with sparsity-inducing norms. Tech rep, INRIA Google Scholar
  12. 12.
    Jenatton R, Mairal J, Obozinski G, Bach F (2010) Proximal methods for sparse hierarchical dictionary learning. In: ICML Google Scholar
  13. 13.
    Kanehisa M, Goto S (2000) Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30 CrossRefGoogle Scholar
  14. 14.
    Kim S, Xing E (2009) Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet 5(8) Google Scholar
  15. 15.
    Kim S, Xing EP (2010) Tree-guided group lasso for multi-task regression with structured sparsity. In: ICML Google Scholar
  16. 16.
    Mairal J, Jenatton R, Obozinski G, Bach F (2010) Network flow algorithms for structured sparsity. In: NIPS Google Scholar
  17. 17.
    Mol D, Vito D, Rosasco L (2009) Elastic net regularization in learning theory. J Complex 25:201–230 MATHCrossRefGoogle Scholar
  18. 18.
    Naylor M, Lin X, Weiss S, Raby B, Lange C (2010) Using canonical correlation analysis to discover genetic regulatory variants. PLoS One Google Scholar
  19. 19.
    Nesterov Y (2003) Excessive gap technique in non-smooth convex minimization. Tech rep, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE) Google Scholar
  20. 20.
    Nesterov Y (2003) Introductory lectures on convex optimization: a basic course. Kluwer Academic, Dordrecht Google Scholar
  21. 21.
    Nesterov Y (2005) Smooth minimization of non-smooth functions. Math Program 103(1):127–152 MathSciNetMATHCrossRefGoogle Scholar
  22. 22.
    Parkhomenko E, Tritchler D, Beyene J (2009) Sparse canonical correlation analysis with application to genomic data integration. Stat Appl Genet Mol Biol 8:1–34 MathSciNetCrossRefGoogle Scholar
  23. 23.
    Shen X, Huang HC (2010) Grouping pursuit through a regularization solution surface. J Am Stat Assoc 105(490):727–739 MathSciNetCrossRefGoogle Scholar
  24. 24.
    Tibshirani R, Saunders M (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc B 67(1):91–108 MathSciNetMATHCrossRefGoogle Scholar
  25. 25.
    Tütüncü RH, Toh KC, Todd MJ (2003) Solving semidefinite-quadratic-linear programs using sdpt3. Math Program 95:189–217 MathSciNetMATHCrossRefGoogle Scholar
  26. 26.
    Waaijenborg S, de Witt Hamer PV, Zwinderman A (2008) Quantifying the association between gene expressions and DNA-markers by penalized canonical correlation analysis. Stat Appl Genet Mol Biol 7 Google Scholar
  27. 27.
    Wen Z, Goldfarb D, Yin W (2009) Alternating direction augmented Lagrangian methods for semidefinite programming. Tech rep, Dept of IEOR, Columbia University Google Scholar
  28. 28.
    Witten D, Tibshirani R (2009) Extensions of sparse canonical correlation analysis with applications to genomic data. Stat Appl Genet Mol Biol 8(1):1–27 MathSciNetCrossRefGoogle Scholar
  29. 29.
    Witten D, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10:515–534 CrossRefGoogle Scholar
  30. 30.
    Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc B 68:49–67 MathSciNetMATHCrossRefGoogle Scholar
  31. 31.
    Zhao P, Rocha G, Yu B (2009) Grouped and hierarchical model selection through composite absolute penalties. Ann Stat 37(6A):3468–3497 MathSciNetMATHCrossRefGoogle Scholar
  32. 32.
    Zhu J et al. (2008) Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat Genet 40:854–861 CrossRefGoogle Scholar
  33. 33.
    Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67:301–320 MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© International Chinese Statistical Association 2011

Authors and Affiliations

  1. 1.Machine Learning DepartmentCarnegie Mellon UniversityPittsburgUSA
  2. 2.Biostatistics Department, Computer Science DepartmentJohns Hopkins UniversityBaltimoreUSA

Personalised recommendations