Advertisement

A Survey of Current Integrative Network Algorithms for Systems Biology

  • Andrew K. Rider
  • Nitesh V. Chawla
  • Scott J. Emrich

Abstract

The goal of systems biology is to gain a more complete understanding of biological systems by viewing all of their components and the interactions between them simultaneously. Until recently, the most complete global view of a biological system was through the use of gene expression or protein-protein interaction data. With the increasing number of high-throughput technologies for measuring genomic, proteomic, and metabolomic data, scientists now have the opportunity to create complex network-based models for drug discovery, protein function annotation, and many other problems. Each technology used to measure a biological system inherently presents a limited view of the system. However, the combination of multiple technologies can provide a more complete picture. Much recent work has studied integrating these heterogeneous data types into single networks. Here we provide a survey of integrative network-based approaches to problems in systems biology. We focus on describing the variety of algorithms used in integrative network inference. Ultimately, the survey of current approaches leads us to the conclusion that there is an urgent need for a standard set of evaluation metrics and data sets in this field.

Keywords

Network inference Integrative networks Systems biology 

Acronyms

PPI

Protein-protein interaction

GO

Gene ontology

TF

Transcription factor

TFBS

Transcription factor binding site

eQTL

Expression quantitative trait locus

References

  1. 1.
    Schadt EE, Friend SH, Shaywitz DA (2009) A network view of disease and compound screening. Nat Rev Drug Discov 8:286–295PubMedCrossRefGoogle Scholar
  2. 2.
    Friedman N, Linial M, Nachman I, Pe’er D (2000) Using Bayesian networks to analyze expression data. J Comput Biol 7:601–620PubMedCrossRefGoogle Scholar
  3. 3.
    Rao A, Hero AO, States DJ, Engel JD (2007) Using directed information to build biologically relevant influence networks. Comput Syst Bioinform/Life Sci Soc Comput Syst Bioinform Conf 6:145–156Google Scholar
  4. 4.
    De Smet R, Marchal K (2010) Advantages and limitations of current network inference methods. Nat Rev Micro 8:717–729Google Scholar
  5. 5.
    Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol 3Google Scholar
  6. 6.
    Hecker M, Lambeck S, Toepfer S, Van Someren E, Guthke R (2009) Gene regulatory network inference: data integration in dynamic models: a review. Biosystems 96:86–103PubMedCrossRefGoogle Scholar
  7. 7.
    Gitter A, Siegfried Z, Klutstein M, Fornes O, Oliva B et al (2009) Backup in gene regulatory networks explains differences between binding and knockout results. Mol Syst Biol 5Google Scholar
  8. 8.
    Califano A, Butte A, Friend S, Ideker T, Schadt EE (2011) Integrative network-based association studies: leveraging cell regulatory models in the post-GWAS era. Nat Precedings 10Google Scholar
  9. 9.
    Bebek G, Koyutürk M, Price ND, Chance MR (2012) Network biology methods integrating biological data for translational science. Briefings BioinformGoogle Scholar
  10. 10.
    Canales R, Luo Y, Willey J, Austermiller B, Barbacioru C et al (2006) Evaluation of dna microarray results with quantitative gene expression platforms. Nat Biotechnol 24:1115–1122PubMedCrossRefGoogle Scholar
  11. 11.
    Quackenbush J (2002) Microarray data normalization and transformation. Nat Genet 32:496PubMedCrossRefGoogle Scholar
  12. 12.
    Christie KR, Hong EL, Cherry JM (2009) Functional annotations for the Saccharomyces cerevisiae genome: the knowns and the known unknowns. Trends Microbiol 17:286–294PubMedCentralPubMedCrossRefGoogle Scholar
  13. 13.
    Hanisch D, Zien A, Zimmer R, Lengauer T (2002) Co-clustering of biological networks and gene expression data. Bioinformatics 18:S145–S154PubMedCrossRefGoogle Scholar
  14. 14.
    Datta S, Datta S (2003) Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19:459–466PubMedCrossRefGoogle Scholar
  15. 15.
    Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci 95:14863–14868PubMedCentralPubMedCrossRefGoogle Scholar
  16. 16.
    Margolin A, Nemenman I, Basso K, Wiggins C, Stolovitzky G et al (2006) ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinform 7:S7Google Scholar
  17. 17.
    Meyer P, Lafitte F, Bontempi G (2008) Minet: AR/Bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinform 9:461Google Scholar
  18. 18.
    Sen T, Kloczkowski A, Jernigan R (2006) Functional clustering of yeast proteins from the protein-protein interaction network. BMC Bioinform 7:355Google Scholar
  19. 19.
    Aparicio O, Geisberg JV, Sekinger E, Yang A, Moqtaderi Z et al (2005) Chromatin immunoprecipitation for determining the association of proteins with specific genomic sequences in vivo. In: Ausubel FM et al Current protocols in molecular biology. Chapter 21Google Scholar
  20. 20.
    Jansen R (2001) Genetical genomics: the added value from segregation. Trends Genet 17:388–391PubMedCrossRefGoogle Scholar
  21. 21.
    Ashburner M, Ball CA, Blake JA, Botstein D, Butler H et al (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25:25–29PubMedCentralPubMedCrossRefGoogle Scholar
  22. 22.
    Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M (2012) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acid Res 40:D109–D114PubMedCentralPubMedCrossRefGoogle Scholar
  23. 23.
    Keseler IM, Collado-Vides J, Gama-Castro S, Ingraham J, Paley S et al (2005) EcoCyc: a comprehensive database resource for Escherichia coli. Nucleic Acids Res 33Google Scholar
  24. 24.
    Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R et al (2000) Functional discovery via a compendium of expression profiles. Cell 102:109–126PubMedCrossRefGoogle Scholar
  25. 25.
    Steuer R, Kurths J, Daub CO, Weise J, Selbig J (2002) The mutual information: detecting and evaluating dependencies between variables. Bioinformatics 18:S231–S240PubMedCrossRefGoogle Scholar
  26. 26.
    de Matos Simoes R, Emmert-Streib F (2011) Influence of statistical estimators of mutual information and data heterogeneity on the inference of gene regulatory networks. PLoS ONE 6:e29279Google Scholar
  27. 27.
    Mason M, Fan G, Plath K, Zhou Q, Horvath S (2009) Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells. BMC Genomics 10:327Google Scholar
  28. 28.
    Zhou X, Kao MCC, Hung W (2002) Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci U S A 99:12783–12788PubMedCentralPubMedCrossRefGoogle Scholar
  29. 29.
    Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Series B (Methodol):267–288Google Scholar
  30. 30.
    Li C, Li H (2008) Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24:1175–1182PubMedCrossRefGoogle Scholar
  31. 31.
    Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9:432–441PubMedCentralPubMedCrossRefGoogle Scholar
  32. 32.
    Shimamura T, Imoto S, Yamaguchi R, Miyano S (2007) Weighted lasso in graphical gaussian modeling for large gene network estimation based on microarray data. Genome Inform 19:142–153PubMedCrossRefGoogle Scholar
  33. 33.
    Gustafsson M, Hornquist M, Lombardi A (2005) Constructing and analyzing a large-scale gene-to-gene regulatory network lasso-constrained inference and biological validation. IEEE/ACM Trans Comput Biol Bioinform 2:254–261PubMedCrossRefGoogle Scholar
  34. 34.
    Li W, Zhang S, Liu C, Zhou X (2012) Identifying multi-layer gene regulatory modules from multi-dimensional genomic data. Bioinformatics on lineGoogle Scholar
  35. 35.
    Li S, Hsu L, Peng J, Wang P (2011) Bootstrap inference for network construction. Arxiv, preprint arXiv:11115028Google Scholar
  36. 36.
    Needham CJ, Bradford JR, Bulpitt AJ, Westhead DR (2007) A primer on learning in Bayesian networks for computational biology. PLoS Comput Biol 3:e129Google Scholar
  37. 37.
    Maxwell Chickering D, Heckerman D (1997) Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables. Mach Learn 29:181–212CrossRefGoogle Scholar
  38. 38.
    Heckerman D (2008) A tutorial on learning with Bayesian networks. Innovations in Bayesian networks, pp 33–82Google Scholar
  39. 39.
    Zhu J, Zhang B, Smith EN, Drees B, Brem RB et al (2008) Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat Genet 40:854–861PubMedCentralPubMedCrossRefGoogle Scholar
  40. 40.
    Hartemink AJ, Gifford DK, Jaakkola TS, Young RA (2002) Combining location and expression data for principled discovery of genetic regulatory network models. Pacific Symp Biocomput:437–449Google Scholar
  41. 41.
    Tamada Y, Kim S, Bannai H, Imoto S, Tashiro K et al (2003) Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection. Bioinformatics 19:2CrossRefGoogle Scholar
  42. 42.
    Imoto S, Higuchi T, Goto T, Tashiro K, Kuhara S et al (2003) Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks. Proc IEEE Comput Soc Bioinform Conf 2:104–113PubMedGoogle Scholar
  43. 43.
    Doss S, Schadt EE, Drake TA, Lusis AJ (2005) Cis-acting expression quantitative trait loci in mice. Genome Res 15:681–691PubMedCentralPubMedCrossRefGoogle Scholar
  44. 44.
    Wainwright M, Ravikumar P, Lafferty J (2007) High-dimensional graphical model selection using \(l~\)1-regularized logistic regression. In: Advances in neural information processing systems vol 19. p 1465Google Scholar
  45. 45.
    Choi M, Tan V, Anandkumar A, Willsky A (2011) Learning latent tree graphical models. J Mach Learn Res 12:1729–1770Google Scholar
  46. 46.
    Srebro N (2001) Maximum likelihood bounded tree-width markov networks. In: Proceedings of the 17th conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc, pp 504–511Google Scholar
  47. 47.
    Friedman N, Nachman I (2000) Gaussian process networks. In: Proceedings of the 16th conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc, pp 211–219Google Scholar
  48. 48.
    Tu Z, Wang L, Arbeitman MN, Chen T, Sun F (2006) An integrative approach for causal gene identification and gene regulatory pathway inference. Bioinformatics 22:e489–e496PubMedCrossRefGoogle Scholar
  49. 49.
    Lee I, Date SV, Adai AT, Marcotte EM (2004) A probabilistic functional network of yeast genes. Science 306:1555–1558PubMedCrossRefGoogle Scholar
  50. 50.
    Ernst J, Vainas O, Harbison CT, Simon I, Bar-Joseph Z (2007) Reconstructing dynamic regulatory maps. Mol Syst Biol 3Google Scholar
  51. 51.
    Deng M, Chen T, Sun F (2004) An integrated probabilistic model for functional prediction of proteins. J Comput Biol 11:463–475PubMedCrossRefGoogle Scholar
  52. 52.
    Ucar D, Beyer A, Parthasarathy S, Workman CT (2009) Predicting functionality of protein-DNA interactions by integrating diverse evidence. Bioinformatics 25:i137–144PubMedCentralPubMedCrossRefGoogle Scholar
  53. 53.
    Ernst J, Beg QK, Kay KA, Balázsi G, Oltvai ZN et al (2008) A semi-supervised method for predicting transcription factor-gene interactions in Escherichia coli. PLoS Comput Biol 4:e1000044Google Scholar
  54. 54.
    Hwang D, Rust AG, Ramsey S, Smith JJ, Leslie DM et al (2005) A data integration methodology for systems biology. Proc Natl Acad Sci U S A 102:17296PubMedCentralPubMedCrossRefGoogle Scholar
  55. 55.
    modENCODE Consortium, Roy S, Ernst J, Kharchenko PV, Kheradpour P et al (2010) Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science (New York) 330:1787–1797Google Scholar
  56. 56.
    Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A et al (2010) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acid Res 39:D561–D568PubMedCentralPubMedCrossRefGoogle Scholar
  57. 57.
    Davis DA, Chawla NV (2011) Exploring and exploiting disease interactions from multi-relational gene and phenotype networks. PLoS ONE 6:e22670Google Scholar
  58. 58.
    Segal MR, Dahlquist KD, Conklin BR (2003) Regression approaches for microarray data analysis. J Comput Biol 10:961–980PubMedCrossRefGoogle Scholar
  59. 59.
    Kim H, Hu W, Kluger Y (2006) Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae. BMC Bioinform 7:165Google Scholar
  60. 60.
    Gao F, Foat B, Bussemaker H (2004) Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data. BMC Bioinform 5:31Google Scholar
  61. 61.
    Luscombe NM, Madan Babu M et al (2004) Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431:308–312PubMedCrossRefGoogle Scholar
  62. 62.
    Tanay A, Sharan R, Kupiec M, Shamir R (2004) Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proc Natl Acad Sci U S A 101:2981–2986PubMedCentralPubMedCrossRefGoogle Scholar
  63. 63.
    Lemmens K, De Bie T, Dhollander T, De Keersmaecker S, Thijs I et al (2009) DISTILLER: a data integration framework to reveal condition dependency of complex regulons in Escherichia coli. Genome Biol 10:R27Google Scholar
  64. 64.
    Jeong H, Tombor B, Albert R, Oltvai ZN, Barabási AL (2000) The large-scale organization of metabolic networks. Nature 407:651–654PubMedCrossRefGoogle Scholar
  65. 65.
    van Noort V, Snel B, Huynen MA (2004) The yeast coexpression network has a small-world, scale-free architecture and can be explained by a simple model. EMBO Rep 5:280–284PubMedCentralPubMedCrossRefGoogle Scholar
  66. 66.
    Yip A, Horvath S (2007) Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinform 8:22Google Scholar
  67. 67.
    Clauset A, Shalizi CR, Newman MEJ (2009) Power-law distributions in empirical data. SIAM Rev 51:661Google Scholar
  68. 68.
    Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D et al (2010) Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci 107:6286–6291PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • Andrew K. Rider
    • 1
  • Nitesh V. Chawla
    • 1
  • Scott J. Emrich
    • 1
  1. 1.Department of Computer Science and EngineeringUniversity of Notre DameNotre DameUSA

Personalised recommendations