Skip to main content
Log in

An Efficient Optimization Algorithm for Structured Sparse CCA, with Applications to eQTL Mapping

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

In this paper we develop an efficient optimization algorithm for solving canonical correlation analysis (CCA) with complex structured-sparsity-inducing penalties, including overlapping-group-lasso penalty and network-based fusion penalty. We apply the proposed algorithm to an important genome-wide association study problem, eQTL mapping. We show that, with the efficient optimization algorithm, one can easily incorporate rich structural information among genes into the sparse CCA framework, which improves the interpretability of the results obtained. Our optimization algorithm is based on a general excessive gap optimization framework and can scale up to millions of variables. We demonstrate the effectiveness of our algorithm on both simulated and real eQTL datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272

    Article  Google Scholar 

  2. Berriz G, Beaver J, Cenik C, Tasan M, Roth F (2009) Next generation software for functional trend analysis. Bioinformatics 25(22):3043–3044

    Article  Google Scholar 

  3. Bindea G et al. (2009) Cluego: a cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25(8):1091–1093

    Article  Google Scholar 

  4. Borwein J, Lewis AS (2000) Convex analysis and nonlinear optimization: theory and examples. Springer, Berlin

    MATH  Google Scholar 

  5. Brem RB, Krulyak L (2005) The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc Natl Acad Sci 102(5):1572–1577

    Article  Google Scholar 

  6. Cao KL, Pascal M, Robert-Cranie C, Philippe B (2009) Sparse canonical methods for biological data integration: application to a cross-platform study. Bioinformatics 10

  7. Chen X, Lin Q, Kim S, Carbonell J, Xing E (2011) Smoothing proximal gradient method for general structured sparse learning. In: Uncertainty in artificial intelligence

    Google Scholar 

  8. Duchi J, Singer Y (2009) Efficient online and batch learning using forward backward splitting. J Mach Learn Res 10:2899–2934

    MathSciNet  MATH  Google Scholar 

  9. Hiriart-Urruty JB, Lemarechal C (2001) Fundamentals of convex analysis. Springer, Berlin

    Book  MATH  Google Scholar 

  10. Jacob L, Obozinski G, Vert JP (2009) Group lasso with overlap and graph lasso. In: ICML

    Google Scholar 

  11. Jenatton R, Audibert J, Bach F (2009) Structured variable selection with sparsity-inducing norms. Tech rep, INRIA

  12. Jenatton R, Mairal J, Obozinski G, Bach F (2010) Proximal methods for sparse hierarchical dictionary learning. In: ICML

    Google Scholar 

  13. Kanehisa M, Goto S (2000) Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30

    Article  Google Scholar 

  14. Kim S, Xing E (2009) Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet 5(8)

  15. Kim S, Xing EP (2010) Tree-guided group lasso for multi-task regression with structured sparsity. In: ICML

    Google Scholar 

  16. Mairal J, Jenatton R, Obozinski G, Bach F (2010) Network flow algorithms for structured sparsity. In: NIPS

    Google Scholar 

  17. Mol D, Vito D, Rosasco L (2009) Elastic net regularization in learning theory. J Complex 25:201–230

    Article  MATH  Google Scholar 

  18. Naylor M, Lin X, Weiss S, Raby B, Lange C (2010) Using canonical correlation analysis to discover genetic regulatory variants. PLoS One

  19. Nesterov Y (2003) Excessive gap technique in non-smooth convex minimization. Tech rep, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE)

  20. Nesterov Y (2003) Introductory lectures on convex optimization: a basic course. Kluwer Academic, Dordrecht

    Google Scholar 

  21. Nesterov Y (2005) Smooth minimization of non-smooth functions. Math Program 103(1):127–152

    Article  MathSciNet  MATH  Google Scholar 

  22. Parkhomenko E, Tritchler D, Beyene J (2009) Sparse canonical correlation analysis with application to genomic data integration. Stat Appl Genet Mol Biol 8:1–34

    Article  MathSciNet  Google Scholar 

  23. Shen X, Huang HC (2010) Grouping pursuit through a regularization solution surface. J Am Stat Assoc 105(490):727–739

    Article  MathSciNet  Google Scholar 

  24. Tibshirani R, Saunders M (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc B 67(1):91–108

    Article  MathSciNet  MATH  Google Scholar 

  25. Tütüncü RH, Toh KC, Todd MJ (2003) Solving semidefinite-quadratic-linear programs using sdpt3. Math Program 95:189–217

    Article  MathSciNet  MATH  Google Scholar 

  26. Waaijenborg S, de Witt Hamer PV, Zwinderman A (2008) Quantifying the association between gene expressions and DNA-markers by penalized canonical correlation analysis. Stat Appl Genet Mol Biol 7

  27. Wen Z, Goldfarb D, Yin W (2009) Alternating direction augmented Lagrangian methods for semidefinite programming. Tech rep, Dept of IEOR, Columbia University

  28. Witten D, Tibshirani R (2009) Extensions of sparse canonical correlation analysis with applications to genomic data. Stat Appl Genet Mol Biol 8(1):1–27

    Article  MathSciNet  Google Scholar 

  29. Witten D, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10:515–534

    Article  Google Scholar 

  30. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc B 68:49–67

    Article  MathSciNet  MATH  Google Scholar 

  31. Zhao P, Rocha G, Yu B (2009) Grouped and hierarchical model selection through composite absolute penalties. Ann Stat 37(6A):3468–3497

    Article  MathSciNet  MATH  Google Scholar 

  32. Zhu J et al. (2008) Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat Genet 40:854–861

    Article  Google Scholar 

  33. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67:301–320

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xi Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, X., Liu, H. An Efficient Optimization Algorithm for Structured Sparse CCA, with Applications to eQTL Mapping. Stat Biosci 4, 3–26 (2012). https://doi.org/10.1007/s12561-011-9048-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-011-9048-z

Keywords

Navigation