Abstract
Metagenomicstudies consider the genetic makeup of microbial communities as a whole, rather than their individual member organisms. The functional and metabolic potential of microbial communities can be analyzed by comparing the relative abundance of gene families in their collective genomic sequences (metagenome) under different conditions. Such comparisons require accurate estimation of gene family frequencies. We present a statistical framework for assessing these frequencies based on the Lander-Waterman theory developed originally for Whole Genome Shotgun (WGS) sequencing projects. We also provide a novel method for assessing the reliability of the estimations which can be used for removing seemingly unreliable measurements. We tested our method on a wide range of datasets, including simulated genomes and real WGS data from sequencing projects of whole genomes. Results suggest that our framework corrects inherent biases in accepted methods and provides a good approximation to the true statistics of gene families in WGS projects.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Beja, O., Aravind, L., Koonin, E.V., Suzuki, M.T., Hadd, A., et al.: Bacterial Rhodopsin: Evidence for a New Type of Phototrophy in the Sea. Science 289(5486), 1902–1906 (2000)
Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., et al.: Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science 304(5667), 66–74 (2004)
Angly, E.A., Felts, B., Salamon, P., Edwards, E.A., Carlson, C., et al.: The Marine Viromes of Four Oceanic Regions. PLoS Biol. 4(11) (2006)
Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E.E., Ram, R.J., et al.: Community Structure and Metabolism through Reconstruction of Microbial Genomes from the Environment. Nature 428(6978), 37–43 (2004)
Gill, S.R., Pop, M., Deboy, R.T., Eckburg, P.B., Turnbaugh, P.J., et al.: Metagenomic Analysis of the Human Distal Gut Microbiome. Science 312(5778), 1355–1359 (2006)
DeLong, E.F., Preston, C.M., Mincer, T., Rich, V., Hallam, S.J., et al.: Community Genomics among Stratified Microbial Assemblages in the Ocean’s Interior. Science 311(5760), 496–503 (2006)
Markowitz, V.M., Szeto, E., Palaniappan, K., Grechkin, Y., Chu, K., et al.: The Integrated Microbial Genomes (IMG) System in 2007: Data Content and Analysis Tool Extensions. Nucleic Acids Res. 36(Database Issue), DS528–DS533 (2008)
Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Kiryutin, B., et al.: The COG Database: an Updated Version Includes Eukaryotes. BMC Bioinformatics 4, 41 (2003)
Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, J.S., et al.: The Pfam Protein Families Database. Nucleic Acids Res. 36(Database Issue), D281–D288 (2008)
Haft, D.H., Selengut, J.D., White, O.: The TIGRFAMs Database of Protein Families. Nucleic Acids Res. 31, 371–373 (2003)
Rodriguez-Brito, B., Rohwer, F., Edwards, R.A.: An Application of Statistics to Comparative Metagenomics. BMC Bioinformatics 20(7), 162 (2006)
Tringe, S.G., von Mering, C., Kobayashi, A., Salamov, A.A., Chen, K., et al.: Comparative Metagenomics of Microbial Communities. Science 308(5721), 554–557 (2005)
Rusch, D.B., Halpern, A.L., Sutton, G., Heidelberg, K.B., Williamson, S., et al.: The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biol. 5(3), e77 (2007)
Yooseph, S., Sutton, G., Rusch, D.B., Halpern, A.L., Williamson, S.J., et al.: The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families. PLoS Biol. 5(3), e16 (2007)
Overbeek, R., Begley, T., Butler, R.M., Choudhuri, J.V., Chuang, H.Y., et al.: The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes. Nucleic Acids Res. 33, 5691–5702 (2005)
Lander, E.S., Waterman, M.S.: Genomic Mapping by Fingerprinting Random Clones: a Mathematical Analysis. Genomics 2(3), 231–239 (1988)
Schloss, P.D., Handelssman, J.: A Statistical Toolbox for Metagenomics: Assessing Functional Diversity in Microbial Communities. BMC Bioinformatics 9(34) (2008)
Sorek, R., Zhu, Y., Creevey, C., Francino, M.P., Bork, P., Rubin, E.M.: Genome-wide Experimental Determination of Barriers to Horizontal Gene Transfer. Science 318(5855), 1449–1452 (2007)
Mavromatis, K., Ivanova, N., Barry, K., Shapiro, H., Goltsman, E., et al.: Use of Simulated Data Sets to Evaluate the Fidelity of Metagenomic Processing Methods. Nature Methods 4, 495–500 (2007)
Sanger, F., Coulson, A.R., Hong, G.F., Hill, D.F., Petersen, G.B.: Nucleotide Sequence of Bacteriophage Lambda DNA. J. Mol. Biol. 162, 4 (1982)
Fleischmann, R.D., Adams, M.D., White, O., Clayton, R.A., Kirkness, E.F., et al.: Whole-genome Random Sequencing and Assembly of Haemophilus influenzae Rd. Science 269(5223), 496–512 (1995)
Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., et al.: The Sequence of the Human Genome. Science 291(5507), 1304–1351 (2001)
Kanehisa, M., Goto, S.: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215, 403–410 (1990)
MartÃn-Cuadrado, A.B., López-GarcÃa, P., Gottschalk, G., RodrÃguez-Valera, F.: Metagenomics of the Deep Mediterranean, a Warm Bathypelagic Habitat. PLoS ONE 2, 914 (2007)
Warnecke, F., Luginbuhl, P., Ivanova, N., Ghassemian, M., Richardson, T.H., et al.: Metagenomic and Functional Analysis of Hindgut Microbiota of a Wood Feeding Higher Termite. Nature 450, 560–565 (2007)
Marchler-Bauer, A., Anderson, J.B., Chitsaz, F., Derbyshire, M.K., DeWeese-Scott, C., et al.: Specific Functional Annotation with the Conserved Domain Database. Nucleic Acids Res. 37(Database Issue), D205–D210
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sharon, I., Pati, A., Markowitz, V.M., Pinter, R.Y. (2009). A Statistical Framework for the Functional Analysis of Metagenomes. In: Batzoglou, S. (eds) Research in Computational Molecular Biology. RECOMB 2009. Lecture Notes in Computer Science(), vol 5541. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02008-7_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-02008-7_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02007-0
Online ISBN: 978-3-642-02008-7
eBook Packages: Computer ScienceComputer Science (R0)