Statistics in Biosciences

, Volume 4, Issue 1, pp 157–176 | Cite as

Frequent Pattern Discovery in Multiple Biological Networks: Patterns and Algorithms

  • Wenyuan Li
  • Haiyan Hu
  • Yu Huang
  • Haifeng Li
  • Michael R. Mehan
  • Juan Nunez-Iglesias
  • Min Xu
  • Xifeng Yan
  • Xianghong Jasmine Zhou
Open Access
Article

Abstract

The rapid accumulation of biological network data is creating an urgent need for computational methods capable of integrative network analysis. This paper discusses a suite of algorithms that we have developed to discover biologically significant patterns that appear frequently in multiple biological networks: coherent dense subgraphs, frequent dense vertex-sets, generic frequent subgraphs, differential subgraphs, and recurrent heavy subgraphs. We demonstrate these methods on gene co-expression networks, using the identified patterns to systematically annotate gene functions, map genome to phenome, and perform high-order cooperativity analysis.

Keywords

Frequent pattern Integrative network analysis Coherent dense subgraph Frequent dense vertex-set Generic frequent subgraph Differential subgraph Recurrent heavy subgraph Tensor representation of multiple networks 

References

  1. 1.
    Achlioptas D, McSherry F (2007) Fast computation of low-rank matrix approximations. J ACM 54(2):9 MathSciNetCrossRefGoogle Scholar
  2. 2.
    Arora S, Hazan E, Kale S (2006) A fast random sampling algorithm for sparsifying matrices. In: Approximation, randomization, and combinatorial optimization. Algorithms and techniques. Springer, Berlin, pp 272–279 CrossRefGoogle Scholar
  3. 3.
    Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet 25(1):25–29 CrossRefGoogle Scholar
  4. 4.
    Barabasi A, Oltvai Z (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5(2):101–113 CrossRefGoogle Scholar
  5. 5.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32 MATHCrossRefGoogle Scholar
  6. 6.
    Butte AJ, Chen R (2006) Finding disease-related genomic experiments within an international repository: first steps in translational bioinformatics. In: AMIA Annual Symposium proceedings, pp 106–110 Google Scholar
  7. 7.
    Butte AJ, Kohane IS (2006) Creation and implications of a phenome-genome network. Nat Biotechnol 24(1):55–62 CrossRefGoogle Scholar
  8. 8.
    Collette Y, Siarry P (2003) Multiobjective optimization: principles and case studies. Springer, Berlin Google Scholar
  9. 9.
    Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210 CrossRefGoogle Scholar
  10. 10.
    Gollub J, Ball CA, Binkley G, Demeter J, Finkelstein DB, Hebert JM, Hernandez-Boussard T, Jin H, Kaloper M, Matese JC, Schroeder M, Brown PO, Botstein D, Sherlock G (2003) The Stanford microarray database: data access and quality assessment tools. Nucleic Acids Res 31(1):94–96 CrossRefGoogle Scholar
  11. 11.
    Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, Margalit H, Armstrong J, Bairoch A, Cesareni G, Sherman D, Apweiler R (2004) IntAct: an open source molecular interaction database. Nucleic Acids Res D 32:452–455 CrossRefGoogle Scholar
  12. 12.
    Hu H, Yan X, Huang Y, Han J, Zhou XJ (2005) Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics 21(Suppl 1):213–221 CrossRefGoogle Scholar
  13. 13.
    Huang Y, Li H, Hu H, Yan X, Waterman MS, Huang H, Zhou XJ (2007) Systematic discovery of functional modules and context-specific functional annotation of human genome. Bioinformatics 23(13):222–229 CrossRefGoogle Scholar
  14. 14.
    Kelley B, Sharan R, Karp R, Sittler T, Root D, Stockwell B, Ideker T (2003) Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci USA 100(20):11394–11399 CrossRefGoogle Scholar
  15. 15.
    Kirkpatrick S, Gelatt C, Vecchi M (1983) Optimization by simulated annealing. Science 220(4598):671–680 MathSciNetMATHCrossRefGoogle Scholar
  16. 16.
    Koyutürk M, Grama A, Szpankowski W (2004) An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics 20(Suppl 1):200–207 CrossRefGoogle Scholar
  17. 17.
    Koyutürk M, Kim Y, Subramaniam S, Szpankowski W, Grama A (2006) Detecting Conserved Interaction Patterns in Biological Networks. J Comput Biol 13(7):1299–1322 MathSciNetCrossRefGoogle Scholar
  18. 18.
    Koyutürk M, Kim Y, Topkara U, Subramaniam S, Szpankowski W, Grama A (2006) Pairwise alignment of protein interaction networks. J Comput Biol 13(2):182–199 MathSciNetCrossRefGoogle Scholar
  19. 19.
    Krishna V, Suri NNRR, Athithan G (2011) A comparative survey of algorithms for frequent subgraph discovery. Curr Sci 100(2):190–198 Google Scholar
  20. 20.
    Li W, Liu CC, Zhang T, Li H, Waterman MS, Zhou XJ (2011) Integrative analysis of many weighted co-expression networks using tensor computation. PLoS Comput Biol. doi:10.1371/journal.pcbi.1001106 MathSciNetGoogle Scholar
  21. 21.
    Mehan MR, Nunez-Iglesias J, Kalakrishnan M, Waterman MS, Zhou XJ (2009) An integrative network approach to map the transcriptome to the phenome. J Comput Biol 16(8):1023–1034 MathSciNetCrossRefGoogle Scholar
  22. 22.
    Mitchell JA, Aronson AR, Mork JG, Folk LC, Humphrey SM, Ward JM (2003) Gene indexing: characterization and analysis of nlm’s generifs. In: AMIA Annual Symposium proceedings, pp 460–464 Google Scholar
  23. 23.
    Newman MEJ (2004) Analysis of weighted networks. Phys Rev E 70(5):056,131 Google Scholar
  24. 24.
    Papin J, Price N, Wiback S, Fell D, Palsson B (2003) Metabolic pathways in the post-genome era. Trends Biochem Sci 28(5):250–258 CrossRefGoogle Scholar
  25. 25.
    Ruepp A, Waegele B, Lechner M, Brauner B, Dunger-Kaltenbach I, Fobo G, Frishman G, Montrone C, Mewes H (2010) CORUM: the comprehensive resource of mammalian protein complexes–2009. Nucleic Acids Res 38:D497–D501 CrossRefGoogle Scholar
  26. 26.
    Serrano MA, Boguñá M, Vespignani A (2009) Extracting the multiscale backbone of complex weighted networks. Proc Natl Acad Sci USA 106(16):6483–6488 CrossRefGoogle Scholar
  27. 27.
    Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, Uetz P, Sittler T, Karp RM, Ideker T (2005) Conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci USA 102(6):1974–1979 CrossRefGoogle Scholar
  28. 28.
    Suman B, Kumar P (2006) A survey of simulated annealing as a tool for single and multiobjective optimization. J Oper Res Soc 57(10):1143–1160 MATHCrossRefGoogle Scholar
  29. 29.
    Tsay AA, Lovejoy WS, Karger DR (1999) Random sampling in cut, flow, and network design problems. Math Oper Res 24(2):383–413 MathSciNetCrossRefGoogle Scholar
  30. 30.
    Wu LF, Hughes TR, Davierwala AP, Robinson MD, Stoughton R, Altschuler SJ (2002) Large-scale prediction of saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat Genet 31(3):255–265 CrossRefGoogle Scholar
  31. 31.
    Xu M, Kao M, Nunez-Iglesias J, Nevins J, West M, Zhou X (2008) An integrative approach to characterize disease-specific pathways and their coordination: a case study in cancer. BMC Genomics 9(Suppl 1):S12 CrossRefGoogle Scholar
  32. 32.
    Yan X, Mehan MR, Huang Y, Waterman MS, Yu PS, Zhou XJ (2007) A graph-based approach to systematically reconstruct human transcriptional regulatory modules. Bioinformatics 23(13):577–586 CrossRefGoogle Scholar
  33. 33.
    Zhang T (2010) Analysis of multi-stage convex relaxation for sparse regularization. J Mach Learn Res 11:1081–1107 MathSciNetGoogle Scholar
  34. 34.
    Zhou X, Kao M, Huang H, Wong A, Nunez-Iglesias J, Primig M, Aparicio O, Finch C, Morgan T, Wong W et al. (2005) Functional annotation and network reconstruction through cross-platform integration of microarray data. Nat Biotechnol 23:238–243 CrossRefGoogle Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  • Wenyuan Li
    • 1
  • Haiyan Hu
    • 2
  • Yu Huang
    • 1
  • Haifeng Li
    • 3
  • Michael R. Mehan
    • 1
  • Juan Nunez-Iglesias
    • 1
  • Min Xu
    • 1
  • Xifeng Yan
    • 4
  • Xianghong Jasmine Zhou
    • 1
  1. 1.Program in Computational Biology, Department of Biological SciencesUniversity of Southern CaliforniaLos AngelesUSA
  2. 2.School of Electrical Engineering and Computer ScienceUniversity of Central FloridaOrlandoUSA
  3. 3.Motorola LabsTempeUSA
  4. 4.Computer Science DepartmentUniversity of California at Santa BarbaraSanta BarbaraUSA

Personalised recommendations