Pattern Mining Across Many Massive Biological Networks

  • Wenyuan Li
  • Haiyan Hu
  • Yu Huang
  • Haifeng Li
  • Michael R. Mehan
  • Juan Nunez-Iglesias
  • Min Xu
  • Xifeng Yan
  • Xianghong Jasmine Zhou


The rapid accumulation of biological network data is creating an urgent need for computational methods on integrative network analysis. Thus far, most such methods focused on the analysis of single biological networks. This chapter discusses a suite of methods we developed to mine patterns across many biological networks. Such patterns include frequent dense subgraphs, frequent dense vertex sets, generic frequent patterns, and differential subgraph patterns. Using the identified network patterns, we systematically perform gene functional annotation, regulatory network reconstruction, and genome to phenome mapping. Finally, tensor computation of multiple weighted biological networks, which filled a gap of integrative network biology, is discussed.



The work presented in this chapter was supported by National Institutes of Health Grants R01GM074163, P50HG002790, and U54CA112952 and NSF Grants 0515936, 0747475 and DMS-0705312.


  1. 1.
    Acar E, Camtepe SA, Krishnamoorthy M, Yener B (2005) Modeling and multiway analysis of chatroom tensors. In: Proc of IEEE Int. Conf. on Intelligence and Security Informatics, pp 256–268Google Scholar
  2. 2.
    Acar E, Aykut-Bingol C, Bingol H, Bro R, Yener B (2007) Multiway analysis of epilepsy tensors. Bioinformatics 23(13):i10–18PubMedCrossRefGoogle Scholar
  3. 3.
    Aja-Fernández S, de Luis García R, Tao D, Li X (eds) (2009) Tensors in Image Processing and Computer Vision. Advances in Pattern Recognition, SpringerGoogle Scholar
  4. 4.
    Alter O, Golub GH (2005) Reconstructing the pathways of a cellular system from genome-scale signals by using matrix and tensor computations. Proc Natl Acad Sci USA 102(49):17559–17564PubMedCrossRefGoogle Scholar
  5. 5.
    Alter O, Brown P, Botstein D (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci USA 97(18):10101–10106PubMedCrossRefGoogle Scholar
  6. 6.
    Alter O, Brown P, Botstein D (2003) Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms. Proc Natl Acad Sci USA 100(6):3351–3356PubMedCrossRefGoogle Scholar
  7. 7.
    Barabasi A, Oltvai Z (2004) Network biology: understanding the cell’s functional organization. Nature Reviews Genetics 5(2):101–113PubMedCrossRefGoogle Scholar
  8. 8.
    Breiman L (2001) Random forests. Machine Learning 45(1):5–32CrossRefGoogle Scholar
  9. 9.
    Butte AJ, Chen R (2006) Finding disease-related genomic experiments within an international repository: first steps in translational bioinformatics. AMIA Annual Symposium proceedings pp 106–110Google Scholar
  10. 10.
    Butte AJ, Kohane IS (2006) Creation and implications of a phenome-genome network. Nat Biotechnol 24(1):55–62PubMedCrossRefGoogle Scholar
  11. 11.
    Cattell RB (1952) The three basic factor-analytic research designs-their interrelations and derivatives. Psychological Bulletin 49:499–452PubMedCrossRefGoogle Scholar
  12. 12.
    Chung FRK (1997) Spectral Graph Theory. No. 92 in CBMS Regional Conference Series in Mathematics, American Mathematical SocietyGoogle Scholar
  13. 13.
    Collette Y, Siarry P (2003) Multiobjective Optimization: Principles and Case Studies. SpringerGoogle Scholar
  14. 14.
    Consortium GO (2006) The gene ontology (go) project in 2006. Nucleic Acids Res 34(Database issue):D322–6Google Scholar
  15. 15.
    Ding C, He X, Zha H (2001) A spectral method to separate disconnected and nearly-disconnected web graph components. In: Proc of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM New York, NY, USA, pp 275–280Google Scholar
  16. 16.
    Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research 30(1):207–210PubMedCrossRefGoogle Scholar
  17. 17.
    Faloutsos C, Kolda TG, Sun J (2007) Mining large graphs and streams using matrix and tensor tools. In: Proc. of the ACM SIGMOD International Conference on Management of Data, p 1174Google Scholar
  18. 18.
    Geman S, Geman D (1984) Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6: 721–741CrossRefGoogle Scholar
  19. 19.
    Gollub J, Ball CA, Binkley G, Demeter J, Finkelstein DB, Hebert JM, Hernandez-Boussard T, Jin H, Kaloper M, Matese JC, Schroeder M, Brown PO, Botstein D, Sherlock G (2003) The stanford microarray database: data access and quality assessment tools. Nucleic Acids Research 31(1):94–96PubMedCrossRefGoogle Scholar
  20. 20.
    Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: FIMI’03 Workshop on Frequent Itemset Mining ImplementationsGoogle Scholar
  21. 21.
    Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, Margalit H, Armstrong J, Bairoch A, Cesareni G, Sherman D, Apweiler R (2004) IntAct: an open source molecular interaction database. Nucleic Acids Research 32(Database issue):D452–455Google Scholar
  22. 22.
    Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA 79(8):2554–2558PubMedCrossRefGoogle Scholar
  23. 23.
    Hu H, Yan X, Huang Y, Han J, Zhou XJ (2005) Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics 21(Suppl 1):i213–221PubMedCrossRefGoogle Scholar
  24. 24.
    Huang Y, Li H, Hu H, Yan X, Waterman MS, Huang H, Zhou XJ (2007) Systematic discovery of functional modules and context-specific functional annotation of human genome. Bioinformatics 23(13):i222–229PubMedCrossRefGoogle Scholar
  25. 25.
    Kelley B, Sharan R, Karp R, Sittler T, Root D, Stockwell B, Ideker T (2003) Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci USA 100(20):11394–11399PubMedCrossRefGoogle Scholar
  26. 26.
    Kirkpatrick S, Gelatt C, Vecchi M (1983) Optimization by simulated annealing. Science 220(4598):671–680PubMedCrossRefGoogle Scholar
  27. 27.
    Kolda TG, Bader BW, Kenny JP (2005) Higher-order web link analysis using multilinear algebra. In: Proc of IEEE Int. Conf. on Data Mining, pp 242–249Google Scholar
  28. 28.
    Koyutürk M, Grama A, Szpankowski W (2004) An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics 20 Suppl 1:i200–207PubMedCrossRefGoogle Scholar
  29. 29.
    Koyutürk M, Kim Y, Subramaniam S, Szpankowski W, Grama A (2006a) Detecting Conserved Interaction Patterns in Biological Networks. J Comput Biol 13(7):1299–1322PubMedCrossRefGoogle Scholar
  30. 30.
    Koyutürk M, Kim Y, Topkara U, Subramaniam S, Szpankowski W, Grama A (2006b) Pairwise alignment of protein interaction networks. J Comput Biol 13(2):182–199PubMedCrossRefGoogle Scholar
  31. 31.
    Luxburg U (2007) A tutorial on spectral clustering. Statistics and Computing 17(4):395–416CrossRefGoogle Scholar
  32. 32.
    Mahoney M, Maggioni M, Drineas P (2008) Tensor-CUR decompositions for tensor-based data. SIAM Journal on Matrix Analysis and Applications 30:957–987CrossRefGoogle Scholar
  33. 33.
    Mehan MR, Nunez-Iglesias J, Kalakrishnan M, Waterman MS, Zhou XJ (2009) An integrative network approach to map the transcriptome to the phenome. J Comput Biol 16(8):1023–1034PubMedCrossRefGoogle Scholar
  34. 34.
    Mitchell JA, Aronson AR, Mork JG, Folk LC, Humphrey SM, Ward JM (2003) Gene indexing: characterization and analysis of nlm’s generifs. AMIA Annual Symposium proceedings pp 460–4Google Scholar
  35. 35.
    Motzkin TS, Straus EG (1965) Maxima for graphs and a new proof of a theorem of Turán. Canad J Math 17(4):533–540CrossRefGoogle Scholar
  36. 36.
    Newman MEJ (2004) Analysis of weighted networks. Phys Rev E 70(5):056131CrossRefGoogle Scholar
  37. 37.
    Ng A, Jordan M, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. In: Proc. Advances in Neural Information Processing Systems, pp 849–856Google Scholar
  38. 38.
    Omberg L, Golub GH, Alter O (2007) A tensor higher-order singular value decomposition for integrative analysis of DNA microarray data from different studies. Proc Natl Acad Sci USA 104(47):18371–18376PubMedCrossRefGoogle Scholar
  39. 39.
    Papadimitriou CH (1981) On the complexity of integer programming. Journal of the ACM 28(4):765–768CrossRefGoogle Scholar
  40. 40.
    Papin J, Price N, Wiback S, Fell D, Palsson B (2003) Metabolic pathways in the post-genome era. Trends Biochem Sci 28(5):250–258PubMedCrossRefGoogle Scholar
  41. 41.
    Serrano MA, Boguñá M, Vespignani A (2009) Extracting the multiscale backbone of complex weighted networks. Proc Natl Acad Sci USA 106(16):6483–6488PubMedCrossRefGoogle Scholar
  42. 42.
    Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, Uetz P, Sittler T, Karp RM, Ideker T (2005) Conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci USA 102(6):1974–1979PubMedCrossRefGoogle Scholar
  43. 43.
    Smilde A, Bro R, Geladi P (2004) Multi-way Analysis: Applications in the Chemical Sciences. Wiley, West Sussex, EnglandCrossRefGoogle Scholar
  44. 44.
    Suman B, Kumar P (2006) A survey of simulated annealing as a tool for single and multiobjective optimization. Journal of the Operational Research Society 57(10):1143–1160CrossRefGoogle Scholar
  45. 45.
    Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: Proc of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 374–383Google Scholar
  46. 46.
    Sun J, Tao D, Papadimitriou S, Yu PS, Faloutsos C (2008a) Incremental tensor analysis: Theory and applications. ACM Transactions on Knowledge Discovery from Data 2(3)Google Scholar
  47. 47.
    Sun J, Tsourakakis C, Hoke E, Faloutsos C, Eliassi-Rad T (2008b) Two heads better than one: pattern discovery in time-evolving multi-aspect data. Data Mining and Knowledge Discovery 17(1):111–128CrossRefGoogle Scholar
  48. 48.
    Tao D, Song M, Li X, Shen J, Sun J, Wu X, Faloutsos C, Maybank SJ (2008) Bayesian tensor approach for 3-d face modeling. IEEE Trans Circuits Syst Video Techn 18(10):1397–1410CrossRefGoogle Scholar
  49. 49.
    Tucker LR (1966) Some mathematical notes on three-mode factor analysis. Psychometrika 31:279–311PubMedCrossRefGoogle Scholar
  50. 50.
    Wu LF, Hughes TR, Davierwala AP, Robinson MD, Stoughton R, Altschuler SJ (2002) Large-scale prediction of saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nature Genetics 31(3):255–265PubMedCrossRefGoogle Scholar
  51. 51.
    Yan X, Mehan MR, Huang Y, Waterman MS, Yu PS, Zhou XJ (2007) A graph-based approach to systematically reconstruct human transcriptional regulatory modules. Bioinformatics 23(13):i577–586PubMedCrossRefGoogle Scholar
  52. 52.
    Zhang T (2008) Multi-stage convex relaxation for learning with sparse regularization. In: Proc. of Advances in Neural Information Processing Systems, pp 1929–1936Google Scholar
  53. 53.
    Zhang T (2009) Multi-stage convex relaxation for non-convex optimization. Tech. rep., Rutgers UniversityGoogle Scholar
  54. 54.
    Zhou X, Kao MJ, Wong WH (2002) Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci USA 99(20):12,783–12,788Google Scholar
  55. 55.
    Zhou X, Kao M, Huang H, Wong A, Nunez-Iglesias J, Primig M, Aparicio O, Finch C, Morgan T, Wong W, et al (2005) Functional annotation and network reconstruction through cross-platform integration of microarray data. Nature Biotechnology 23:238–243PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Wenyuan Li
    • 1
  • Haiyan Hu
    • 2
  • Yu Huang
    • 1
  • Haifeng Li
    • 3
  • Michael R. Mehan
    • 1
  • Juan Nunez-Iglesias
    • 1
  • Min Xu
    • 1
  • Xifeng Yan
    • 4
  • Xianghong Jasmine Zhou
    • 5
  1. 1.University of Southern CaliforniaLos AngelesUSA
  2. 2.University of Central FloridaOrlandoUSA
  3. 3.Motorola LabsLos AngelesUSA
  4. 4.University of CaliforniaSanta BarbaraUSA
  5. 5.Program in Computational Biology, Department of Biological SciencesUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations