Abstract
The rapid accumulation of biological network data is creating an urgent need for computational methods capable of integrative network analysis. This paper discusses a suite of algorithms that we have developed to discover biologically significant patterns that appear frequently in multiple biological networks: coherent dense subgraphs, frequent dense vertex-sets, generic frequent subgraphs, differential subgraphs, and recurrent heavy subgraphs. We demonstrate these methods on gene co-expression networks, using the identified patterns to systematically annotate gene functions, map genome to phenome, and perform high-order cooperativity analysis.
Article PDF
Similar content being viewed by others
References
Achlioptas D, McSherry F (2007) Fast computation of low-rank matrix approximations. J ACM 54(2):9
Arora S, Hazan E, Kale S (2006) A fast random sampling algorithm for sparsifying matrices. In: Approximation, randomization, and combinatorial optimization. Algorithms and techniques. Springer, Berlin, pp 272–279
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet 25(1):25–29
Barabasi A, Oltvai Z (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5(2):101–113
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Butte AJ, Chen R (2006) Finding disease-related genomic experiments within an international repository: first steps in translational bioinformatics. In: AMIA Annual Symposium proceedings, pp 106–110
Butte AJ, Kohane IS (2006) Creation and implications of a phenome-genome network. Nat Biotechnol 24(1):55–62
Collette Y, Siarry P (2003) Multiobjective optimization: principles and case studies. Springer, Berlin
Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210
Gollub J, Ball CA, Binkley G, Demeter J, Finkelstein DB, Hebert JM, Hernandez-Boussard T, Jin H, Kaloper M, Matese JC, Schroeder M, Brown PO, Botstein D, Sherlock G (2003) The Stanford microarray database: data access and quality assessment tools. Nucleic Acids Res 31(1):94–96
Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, Margalit H, Armstrong J, Bairoch A, Cesareni G, Sherman D, Apweiler R (2004) IntAct: an open source molecular interaction database. Nucleic Acids Res D 32:452–455
Hu H, Yan X, Huang Y, Han J, Zhou XJ (2005) Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics 21(Suppl 1):213–221
Huang Y, Li H, Hu H, Yan X, Waterman MS, Huang H, Zhou XJ (2007) Systematic discovery of functional modules and context-specific functional annotation of human genome. Bioinformatics 23(13):222–229
Kelley B, Sharan R, Karp R, Sittler T, Root D, Stockwell B, Ideker T (2003) Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci USA 100(20):11394–11399
Kirkpatrick S, Gelatt C, Vecchi M (1983) Optimization by simulated annealing. Science 220(4598):671–680
Koyutürk M, Grama A, Szpankowski W (2004) An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics 20(Suppl 1):200–207
Koyutürk M, Kim Y, Subramaniam S, Szpankowski W, Grama A (2006) Detecting Conserved Interaction Patterns in Biological Networks. J Comput Biol 13(7):1299–1322
Koyutürk M, Kim Y, Topkara U, Subramaniam S, Szpankowski W, Grama A (2006) Pairwise alignment of protein interaction networks. J Comput Biol 13(2):182–199
Krishna V, Suri NNRR, Athithan G (2011) A comparative survey of algorithms for frequent subgraph discovery. Curr Sci 100(2):190–198
Li W, Liu CC, Zhang T, Li H, Waterman MS, Zhou XJ (2011) Integrative analysis of many weighted co-expression networks using tensor computation. PLoS Comput Biol. doi:10.1371/journal.pcbi.1001106
Mehan MR, Nunez-Iglesias J, Kalakrishnan M, Waterman MS, Zhou XJ (2009) An integrative network approach to map the transcriptome to the phenome. J Comput Biol 16(8):1023–1034
Mitchell JA, Aronson AR, Mork JG, Folk LC, Humphrey SM, Ward JM (2003) Gene indexing: characterization and analysis of nlm’s generifs. In: AMIA Annual Symposium proceedings, pp 460–464
Newman MEJ (2004) Analysis of weighted networks. Phys Rev E 70(5):056,131
Papin J, Price N, Wiback S, Fell D, Palsson B (2003) Metabolic pathways in the post-genome era. Trends Biochem Sci 28(5):250–258
Ruepp A, Waegele B, Lechner M, Brauner B, Dunger-Kaltenbach I, Fobo G, Frishman G, Montrone C, Mewes H (2010) CORUM: the comprehensive resource of mammalian protein complexes–2009. Nucleic Acids Res 38:D497–D501
Serrano MA, Boguñá M, Vespignani A (2009) Extracting the multiscale backbone of complex weighted networks. Proc Natl Acad Sci USA 106(16):6483–6488
Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, Uetz P, Sittler T, Karp RM, Ideker T (2005) Conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci USA 102(6):1974–1979
Suman B, Kumar P (2006) A survey of simulated annealing as a tool for single and multiobjective optimization. J Oper Res Soc 57(10):1143–1160
Tsay AA, Lovejoy WS, Karger DR (1999) Random sampling in cut, flow, and network design problems. Math Oper Res 24(2):383–413
Wu LF, Hughes TR, Davierwala AP, Robinson MD, Stoughton R, Altschuler SJ (2002) Large-scale prediction of saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat Genet 31(3):255–265
Xu M, Kao M, Nunez-Iglesias J, Nevins J, West M, Zhou X (2008) An integrative approach to characterize disease-specific pathways and their coordination: a case study in cancer. BMC Genomics 9(Suppl 1):S12
Yan X, Mehan MR, Huang Y, Waterman MS, Yu PS, Zhou XJ (2007) A graph-based approach to systematically reconstruct human transcriptional regulatory modules. Bioinformatics 23(13):577–586
Zhang T (2010) Analysis of multi-stage convex relaxation for sparse regularization. J Mach Learn Res 11:1081–1107
Zhou X, Kao M, Huang H, Wong A, Nunez-Iglesias J, Primig M, Aparicio O, Finch C, Morgan T, Wong W et al. (2005) Functional annotation and network reconstruction through cross-platform integration of microarray data. Nat Biotechnol 23:238–243
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Li, W., Hu, H., Huang, Y. et al. Frequent Pattern Discovery in Multiple Biological Networks: Patterns and Algorithms. Stat Biosci 4, 157–176 (2012). https://doi.org/10.1007/s12561-011-9047-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-011-9047-0