Advertisement

Probabilistic Graphical Modeling in Systems Biology: A Framework for Integrative Approaches

  • Christine Sinoquet

Abstract

Systems biology may be defined as a discipline aiming at integrating various sources of heterogeneous data, with the objective to describe and predict the function of biological systems. The purpose is to cross many (possibly weak) evidences from several data types that describe different biological features of genes or proteins. Probabilistic graphical models offer an appealing framework for this objective. Through the thorough review of five selected examples, this chapter highlights how probabilistic graphical models can contribute to build the bridge between biology and computational modeling. In this methodological framework, the five cases illustrate three features of these models, which we discuss: flexibility, scalability and ability to combine heterogeneous sources of data. The applications covered address genetic association studies, identification of protein–protein interactions, identification of the target genes of transcription factors, inference of causal phenotype networks and protein function prediction.

Keywords

Systems biology Integrative approach Integration of omics data Heterogeneous sources of data Computational modeling Machine learning Probabilistic framework Probabilistic graphical model Bayesian network Markov random field 

List of Acronyms

BN

Bayesian network

ChIP-chip

Chromatin immunoprecipitation on chip

ChIP-seq

Chromatin immunoprecipitation followed by sequencing

CPN

Causal phenotype network

DDI

Domain-domain interaction

DNA

Deoxyribonucleic acid

GA

Genetic architecture

GO

Gene ontology

GOS

GO sub-ontology

GWAS

Genome wide association study

MCMC

Monte Carlo Markov chain

MRF

Markov random field

MRF-MJM

MRF mixture joint model

PGM

Probabilistic graphical model

PPI

Protein–protein interaction

QTL

Quantitative trait loci

RNA

Ribonucleic acid

RNAi

RNA interference

ROC curve

Receiver operating characteristic curve

SMM

Standard mixture model

TF

Transcription factor

Notes

Acknowledgments

The author wishes to thank the anonymous reviewer for constructive comments on her manuscript, and feedback most helpful to produce the final version.

References

  1. 1.
    Besag J (1986) On the statistical analysis of dirty pictures. J Roy Statist Soc Ser B 48:259–302Google Scholar
  2. 2.
    Carroll S, Pavlovic V (2006) Protein classification using probabilistic chain graphs and the Gene Ontology structure. Bioinformatics 22(15):1871–1878PubMedCrossRefGoogle Scholar
  3. 3.
    Chaibub Neto E, Ferrara CT, Attie AD, Yandell BS (2008) Inferring causal phenotype networks from segregating populations. Genetics 179(2):1089–1100. doi: 10.1534/genetics.107.085167 PubMedCentralPubMedCrossRefGoogle Scholar
  4. 4.
    Chaibub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal graphical models in systems genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann Appl Stat 4(1):320–339CrossRefGoogle Scholar
  5. 5.
    Chen M, Cho J, Zhao H (2011) Incorporating biological pathways via a Markov random field model in genome-wide association studies. PLOS Genet 7(4):e1001353. doi: 10.1371/journal.pgen.1001353 PubMedCentralPubMedCrossRefGoogle Scholar
  6. 6.
    Deng M, Chen T, Sun F (2003) An integrated probabilistic model for functional prediction of proteins. In: Proceedings of the seventh annual international conference on research in computational molecular biology (RECOMb), pp 95–103Google Scholar
  7. 7.
    Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95(25):14863–14868PubMedCentralPubMedCrossRefGoogle Scholar
  8. 8.
    Elnitski L, Jin VX, Farnham PJ, Jones SJ (2006) Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Genome Res 16(12):1455–1464PubMedCrossRefGoogle Scholar
  9. 9.
    Enright AJ, Iliopoulos I, Kyripides NC, Ouzounis CA (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402:86–90PubMedCrossRefGoogle Scholar
  10. 10.
    Franke L, van Bakel H, Fokkens L, de Jong ED, Egmont-Petersen M, Wijmenga C (2006) Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Gen 78(6):1011–1025CrossRefGoogle Scholar
  11. 11.
    Gama-Castro S, Jimánez-Jacinto V, Peralta-Gil M et al (2008) RegulonDB (version 6.0): Gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation. Nucleic Acids Res 36:D120–D124. doi: 10.1093/nar/gkm994 PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM et al (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415(6868):141–147PubMedCrossRefGoogle Scholar
  13. 13.
    Hutz JE, Kraja AT, McLeod HL, Province MA (2008) CANDID: a flexible method for prioritizing candidate genes for complex human traits. Genet Epidemiol 32(8):779–790PubMedCrossRefGoogle Scholar
  14. 14.
    Karaoz U, Murali T, Letovsky S, Zheng Y, Ding C, Cantor CR, Kasif S (2004) Whole-genome annotation by using evidence integration in functional-linkage networks. Proc Natl Acad Sci USA 101:2888–2893PubMedCentralPubMedCrossRefGoogle Scholar
  15. 15.
    Kindermann R, Snell JL (1980) Markov random fields and their applications. American Mathematical SocietyGoogle Scholar
  16. 16.
    Köhler S, Bauer S, Horn D, Robinson PN (2008) Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet 82:949–958PubMedCentralPubMedCrossRefGoogle Scholar
  17. 17.
    Ladunga I (2010) An overview of the computational analyses and discovery of transcription factor binding sites. Methods Mol Biol 674:1–22PubMedCrossRefGoogle Scholar
  18. 18.
    Lauritzen SL (1996) Graphical models. Oxford University Press, New YorkGoogle Scholar
  19. 19.
    Letovsky S, Kasif S (2003) Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics 19:i197–i204PubMedCrossRefGoogle Scholar
  20. 20.
    Li H, Wei Z, Maris J (2010) A hidden Markov random field model for genome-wide association studies. Biostatistics 11:139–150PubMedCentralPubMedCrossRefGoogle Scholar
  21. 21.
    Marcotte EM (2000) Computational genetics: finding protein function by nonhomology methods. Curr Opin Struct Biol 10(3):359–365PubMedCrossRefGoogle Scholar
  22. 22.
    Mering CV, Jensen LJ, Snel B et al (2005) String: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res 33:433–437CrossRefGoogle Scholar
  23. 23.
    Mitrofanova A, Pavlovic V, Mishra B (2011) Prediction of protein functions with Gene Ontology and interspecies protein homology data. EEE/ACM Trans Comput Biol Bioinf 8(3):775–784CrossRefGoogle Scholar
  24. 24.
    Mourad R, Sinoquet C, Leray P (2011) A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies. BMC Bioinform 12:16+Google Scholar
  25. 25.
    Mourad R, Sinoquet C, Dina C, Leray P (2011) Visualization of pairwise and multilocus linkage disequilibrium structure using latent forests. PLOS ONE 6(12):e27320PubMedCentralPubMedCrossRefGoogle Scholar
  26. 26.
    Nariai N, Kolaczyk ED, Kasif S (2007) Probabilistic protein function prediction from heterogeneous genome-wide data. PLOS ONE 2(3):e337PubMedCentralPubMedCrossRefGoogle Scholar
  27. 27.
    Ng SK, Zhang Z, Tan SH, Lin K (2003) InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes. Nucleic Acids Res 31(1):251–254PubMedCentralPubMedCrossRefGoogle Scholar
  28. 28.
    Nguyen TT, Androulakis IP (2009) Recent advances in the computational discovery of transcription factor binding sites. Algorithms 2(1):582–605. doi: 10.3390/a2010582 CrossRefGoogle Scholar
  29. 29.
    Oshchepkov DY, Levitsky VG (2011) In silico prediction of transcriptional factor-binding sites. In: Series. Methods in molecular biology, vol 760, pp 251–267. doi: 10.1007/978-94-007-6803-1_16
  30. 30.
    Pan W, Wei P, Khodursky A (2008) A parametric joint model of DNA-protein binding, gene expression and DNA sequence data to detect target genes of a transcription factor. Pacific Symp Biocomput 13:465–476Google Scholar
  31. 31.
    Peng G, Luo L, Siu H, Zhu Y et al (2010) Gene and pathway-based second-wave analysis of genome-wide association studies. Eur J Hum Genet 18:111–117PubMedCentralPubMedCrossRefGoogle Scholar
  32. 32.
    Peri S, Navarro JD, Amanchy R et al (2003) Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res 13(10):2363–2371PubMedCentralPubMedCrossRefGoogle Scholar
  33. 33.
    Rhodes DR, Tomlins SA, Varambally S et al (2005) Probabilistic model of the human protein-protein interaction network. Nature Biotechnol 23:951–959. doi: 10.1038/nbt1103 CrossRefGoogle Scholar
  34. 34.
    Schadt EE, Lamb J, Yang X et al (2005) An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet 37(7):710–717PubMedCentralPubMedCrossRefGoogle Scholar
  35. 35.
    Sinoquet C, Mourad R, Leray P (2012) Forests of latent tree models for the detection of genetic associations. In: International conference on bioinformatics models, methods and algorithms (Bioinformatics), 5–14 Google Scholar
  36. 36.
    The Gene Ontology Consortium, Ashburner M, Ball CA, Blake JA et al (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25(1):25–29. doi: 10.1038/75556 Google Scholar
  37. 37.
    Verzilli CJ, Stallard N, Whittaker JC (2006) Bayesian graphical models for genome-wide association studies. Am J Hum Genet 79:100–112PubMedCentralPubMedCrossRefGoogle Scholar
  38. 38.
    von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P (2002) Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417(6887):399–403CrossRefGoogle Scholar
  39. 39.
    Wang W, Cherry JM, Nochomovitz Y, Jolly E, Botstein D, Li H (2005) Inference of combinatorial regulation in Yeast transcriptional networks: a case study of sporulation. Proc Natl Acad Sci USA 102:1998–2003PubMedCentralPubMedCrossRefGoogle Scholar
  40. 40.
    Wang K, Li M, Bucan M (2007) Pathway-based approaches for analysis of genomewide associations studies. Am J Hum Genet 81:1278–1283PubMedCentralPubMedCrossRefGoogle Scholar
  41. 41.
    Wei P, Pan W (2012) Bayesian joint modeling of multiple gene networks and diverse genomic data to identify target genes of a transcription factor. Ann Appl Stat 6(1):334–355PubMedCentralPubMedCrossRefGoogle Scholar
  42. 42.
    Wu X, Jiang R, Zhang MQ, Li S (2008) Network-based global inference of human disease genes. Mol Syst Biol 4:189PubMedCentralPubMedCrossRefGoogle Scholar
  43. 43.
    Xia K, Dong D, Han J-DJ (2006) IntNetDB v1.0: an integrated protein-protein interaction network database generated by a probabilistic model. BMC Bioinform 7:508. doi: 10.1186/1471-2105-7-508 CrossRefGoogle Scholar
  44. 44.
    Xia JF, Wang S-L, Lei Y-K (2010) Computational methods for the prediction of protein-protein interactions. Protein Pept Lett 17(9):1069–1078PubMedCrossRefGoogle Scholar
  45. 45.
    Yosef N, Sharan R, Stafford Noble W (2008) Improved network-based identification of protein orthologs. Bioinformatics 24(16):i200–i206PubMedCrossRefGoogle Scholar
  46. 46.
    Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, Vidal M, Gerstein M (2004) Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res 14(6):1107–1118PubMedCentralPubMedCrossRefGoogle Scholar
  47. 47.
    Zhu J, Zhang B, Smith EN, Drees B, Brem RB, Kruglyak L, Bumgarner RE, Schadt EE (2008) Integrating large-scale functional genomic data to dissect the complexity of Yeast regulatory networks. Nat Genet 40(7):854–861. doi: 10.1038/ng.167 PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  1. 1.LINA, UMR CNRS 6241Université de NantesNantes CedexFrance

Personalised recommendations