Skip to main content
Log in

Application of Bayesian networks on large-scale biological data

  • Review
  • Published:
Frontiers in Biology

Abstract

The investigation of the interplay between genes, proteins, metabolites and diseases plays a central role in molecular and cellular biology. Whole genome sequencing has made it possible to examine the behavior of all the genes in a genome by high-throughput experimental techniques and to pinpoint molecular interactions on a genome-wide scale, which form the backbone of systems biology. In particular, Bayesian network (BN) is a powerful tool for the ab-initial identification of causal and non-causal relationships between biological factors directly from experimental data. However, scalability is a crucial issue when we try to apply BNs to infer such interactions. In this paper, we not only introduce the Bayesian network formalism and its applications in systems biology, but also review recent technical developments for scaling up or speeding up the structural learning of BNs, which is important for the discovery of causal knowledge from large-scale biological datasets. Specifically, we highlight the basic idea, relative pros and cons of each technique and discuss possible ways to combine different algorithms towards making BN learning more accurate and much faster.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Akaike H (1974). A new look at the statistical model identification. IEEE Trans Automat Control, 19(6): 716–723

    Article  Google Scholar 

  • Chickering D M (1995). A Transformational Characterization of Equivalent Bayesian Network Structures. Proc 11th Ann Conf Uncertainty Artif Intell, 87–98

  • Cvijovic D, Klinowski J (1995). Taboo search — an approach to the multiple minima problem. Science, 267: 664–666

    Article  CAS  Google Scholar 

  • Efron B, Hastie T, Johnstone I, Tibshirani R (2004). Least angle regression. Ann Statis, 32: 407–499

    Article  Google Scholar 

  • Friedman N (1997). Learning Belief Networks in the Presence of Missing Values and Hidden Variables. Proc 14th Intl Conf Mach Learn, 125–133

  • Fu S, Desmarais M (2008). Fast Markov Blanket Discovery Algorithm Via Local Learning within Single Pass. Canadian Conf AI, 96–107

  • Geiger D, Heckerman D (1995). Learning Gaussian Networks. Proc 10th Ann Conf Uncertainty Artif Intell, 235–243

  • Geman S, Geman D (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Automat Control, 6(6): 721–741

    Google Scholar 

  • Giudici P, Castelo R (2003). Improving Markov chain Monte Carlo Model search for data mining. Mach Learn, 50(1–2): 127–158

    Article  Google Scholar 

  • Grünwald P (2007). The Minimum Description Length principle. Cambridge, MA: MIT Press

    Google Scholar 

  • Heckerman D (1999). A Tutorial on Learning with Bayesian Networks. In: Jordan M, ed. Learning in Graphical Models. Cambridge, MA: MIT Press

    Google Scholar 

  • Heckerman D, Geiger D, Chickering D M (1995). Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Machine Learning, 20(3): 197–243

    Google Scholar 

  • Koivisto M (2006). Advances in Exact Bayesian Structure Discovery in Bayesian Networks. Proc 22nd Conf Uncertainty Artif Intell

  • Koivisto M, Sood K (2004). Exact Bayesian structure discovery in Bayesian networks. J Mach Learn Res, 5: 549–573

    Google Scholar 

  • Koller D, Friedman N (2009). Probabilistic Graphical Models: Principles and Techniques. Cambridge, MA: MIT Press

    Google Scholar 

  • Lauritzen S L, Spiegelhalter D J (1988). Local computations with probabilities on graphical structures and their application to expert systems. J Royal Statist Society. Series B (Methodological), 50(2): 157–224

    Google Scholar 

  • Meek C (1995). Causal inference and causal explanation with background knowledge. Proc 11th Ann Conf Uncertainty Artif Intell: 403–410

  • Moore A W, Lee M S (1998). Cached sufficient statistics for efficient machine learning with large datasets. J Artif Intell Res (JAIR)8: 67–91

    Google Scholar 

  • Peña J M, Nilsson R, Björkegren J, Tegnér J (2007). Towards scalable and data efficient learning of Markov boundaries. Intl J Approx Reasoning, 45(2): 211–232

    Article  Google Scholar 

  • Pearl J (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Fransisco, CA: Morgan Kaufmann Publishers

    Google Scholar 

  • Pearl J, Verma T (1991). A Theory of Inferred Causation. Proc 2nd Intl Conf Princip Knowledge Representation and Reasoning (KR’91): 441–452

  • Schwarz G E (1978). Estimating the dimension of a model. Ann Statis, 6(2): 461–464

    Article  Google Scholar 

  • Silander T, Myllymäki P (2006). A Simple Approach for Finding the Globally Optimal Bayesian Network Structure. Proc 22nd Conf Uncertainty Artif Intell

  • Spirtes P, Glymour C, Scheines R (2001). Causation, Prediction, and Search, 2nd ed. Cambridge, MA: MIT Press

    Google Scholar 

  • Tsamardinos I, Brown L E, Aliferis C F (2006). The max-min hillclimbing Bayesian network structure learning algorithm. Mach Learn, 65(1): 31–78

    Article  Google Scholar 

  • van Steensel B, Braunschweig U, Filion G J, Chen M, van Bemmel J G, Ideker T (2010). Bayesian network analysis of targeting interactions in chromatin. Genome Res, 20: 190–200

    Article  PubMed  Google Scholar 

  • Verma T, Pearl J (1991). Equivalence and synthesis of causal models. Proc Sixth Ann Conf Uncertainty Artif Intell, 255-270

  • Xie X, Geng Z (2008). A recursive method for structural learning of directed acyclic graphs. J Mach Learn Res, 9: 459–483

    Google Scholar 

  • Yu H, Zhu S S, Zhou B, Xue H L, Han J D J (2008). Inferring causal relationships among different histone modifications and gene expression. Genome Res, 18(8): 1314–1324

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yi Liu or Jing-Dong J. Han.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Han, JD.J. Application of Bayesian networks on large-scale biological data. Front. Biol. 5, 98–104 (2010). https://doi.org/10.1007/s11515-010-0023-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11515-010-0023-8

Keywords

Navigation