A Bayesian Approach to High-Throughput Biological Model Generation

  • Xinghua Shi
  • Rick Stevens
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5462)


With the availability of hundreds and soon thousands of complete genomes, the construction of genome-scale metabolic models for these organisms has attracted much attention. Manual work still dominates the process of model generation, however, and leads to the huge gap between the number of complete genomes and genome-scale metabolic models. The challenge in constructing genome-scale models from existing databases is that usually such a directly extracted model is incomplete and contains network holes. Network holes occur when a network is disconnected and certain metabolites cannot be produced or consumed. In order to construct a valid metabolic model, network holes need to be filled by introducing candidate reactions into the network. As a step toward the high-throughput generation of biological models, we propose a Bayesian approach to improving draft genome-scale metabolic models. A collection of 23 types of biological and topological evidence is extracted from the SEED [1], KEGG [2], and BiGG [3] databases. Based on this evidence, we create 23 individual predictors using Bayesian approaches. To combine these individual predictors and unify their predictive results, we build an ensemble of individual predictors on majority vote and four classifiers: naive Bayes classifier, Bayesian network, multilayer perceptron network and AdaBoost. A set of experiments is performed to train and test individual predictors and integrative mechanisms of single predictors and to evaluate the performance of our approach.


Metabolic Network Individual Predictor Reaction Pair Knockout Reaction Pathway Segment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Overbeek, R., Begley, T., Butler, R.M., Choudhuri, J.V., Chuang, H.Y., Cohoon, M., de Crécy–Lagard, V., Diaz, N., Disz, T., Edwards, R., Fonstein, M., Frank, E.D., Gerdes, S., Glass, E.M., Goesmann, A., Hanson, A., Iwata–Reuyl, D., Jensen, R., Jamshidi, N., Krause, L., Kubal, M., Larsen, N., Linke, B., McHardy, A.C., Meyer, F., Neuweger, H., Olsen, G., Olson, R., Osterman, A., Portnoy, V., Pusch, G.D., Rodionov, D.A., Rückert, C., Steiner, J., Stevens, R., Thiele, I., Vassieva, O., Ye, Y., Zagnitko, O., Vonstein, V.: IThe Subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 33(17), 5691–5702 (2005)CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., Yamanishi, Y.: KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, 480–484 (2008)CrossRefGoogle Scholar
  3. 3.
    BiGG: A Biochemical Genetic and Genomic Database of Large Scale Metabolic Reconstructions, http://bigg.ucsd.edu/
  4. 4.
  5. 5.
    Weka: Data mining software in Java, http://www.cs.waikato.ac.nz/~ml/weka/
  6. 6.
    Feist, A.M., Herrgard, M.J., Thiele, I., Reed, J.L., Palsson, B.O.: Reconstruction of biochemical networks in microbial organisms. Nat. Rev. Microbiol. (2008)Google Scholar
  7. 7.
    Palsson, B.: Systems biology: properties of reconstructed networks. Cambridge University Press, Cambridge (2006)CrossRefGoogle Scholar
  8. 8.
    Reed, J.L., Palsson, B.O.: Minireview thirteen years of building constraint-based in silico models of escherichia coli. Journal of Bacteriology, 2692–2699 (2003)Google Scholar
  9. 9.
    Reed, J.L., Vo, T.D., Schilling, C.H., Palsson, B.O.: An expanded genome-scale model of Escherichia coli k-12 (ijr904 gsm/gpr). Genome Biol. 4(9), R54 (2003)CrossRefGoogle Scholar
  10. 10.
    Edwards, J.S., Palsson, B.: Robustness analysis of the Escherichia coli metabolic network. Biotechnology Prog. 16, 927–939 (2000)CrossRefGoogle Scholar
  11. 11.
    Kharchenko, P., Chen, L., Freund, Y., Vitkup, D., Church, G.M.: Identifying metabolic enzymes with multiple types of association evidence. BMC Bioinformatics 7(1), 177 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Chen, L., Vitkup, D.: Predicting genes for orphan metabolic activities using phylogenetic profiles. Geno. Biol. 7, R17 (2006)CrossRefGoogle Scholar
  13. 13.
    DeJongh, M., Formsma, K., Boillot, P., Gould, J., Rycenga, M., Best, A.: Toward the automated generation of genome-scale metabolic networks in the SEED. BMC Bioinformatics 8(139) (2007)Google Scholar
  14. 14.
    Green, M.L., Karp, P.D.: A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics 5(76) (2004)Google Scholar
  15. 15.
    Kharchenko, P., Vitkup, D., Church, G.M.: Filling gaps in a metabolic network using expression information. Bioinformatics 20(suppl. 1), I178–I185 (2004)CrossRefGoogle Scholar
  16. 16.
    Gil, R., Silva, F.J., Pereto, J., Moya, A.: Determination of the core of a minimal bacterial gene set. Microbiology and Molecular Biology Reviews 68(3), 518–537 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Overbeek, R., Begley, T., et al.: The subsystems approach to genome annotation and its use in the Project to Annotate 1000 Genomes. Nucleic Acids Res. 33(17), 5691–5702 (2005)CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Aziz, R.K., Bartels, D., et al.: The RAST server: Rapid Annotations using Subsystems Technology. BMC Genomics 9(75) (2008)Google Scholar
  19. 19.
    Becker, S.A., Palsson, B.O.: Genome-scale reconstruction of the metabolic network in Staphylococcus aureus n315: an initial draft to the two-dimensional annotation. BMC Microbiol. 5(8) (2005)Google Scholar
  20. 20.
    Shi, X., Stevens, R.: SWARM: a scientific workflow for supporting bayesian approaches to improve metabolic models. In: Proceedings of the 6th international workshop on Challenges of Large Applications in Distributed Environments(CLADE) (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Xinghua Shi
    • 1
  • Rick Stevens
    • 1
    • 2
  1. 1.Department of Computer ScienceUniversity of ChicagoChicagoUSA
  2. 2.The Computing, Environment and Life ScienceArgonne National LaboratoryArgonneUSA

Personalised recommendations