Skip to main content

A Statistical Framework for the Functional Analysis of Metagenomes

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2009)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5541))

Abstract

Metagenomicstudies consider the genetic makeup of microbial communities as a whole, rather than their individual member organisms. The functional and metabolic potential of microbial communities can be analyzed by comparing the relative abundance of gene families in their collective genomic sequences (metagenome) under different conditions. Such comparisons require accurate estimation of gene family frequencies. We present a statistical framework for assessing these frequencies based on the Lander-Waterman theory developed originally for Whole Genome Shotgun (WGS) sequencing projects. We also provide a novel method for assessing the reliability of the estimations which can be used for removing seemingly unreliable measurements. We tested our method on a wide range of datasets, including simulated genomes and real WGS data from sequencing projects of whole genomes. Results suggest that our framework corrects inherent biases in accepted methods and provides a good approximation to the true statistics of gene families in WGS projects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beja, O., Aravind, L., Koonin, E.V., Suzuki, M.T., Hadd, A., et al.: Bacterial Rhodopsin: Evidence for a New Type of Phototrophy in the Sea. Science 289(5486), 1902–1906 (2000)

    Article  CAS  PubMed  Google Scholar 

  2. Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., et al.: Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science 304(5667), 66–74 (2004)

    Article  CAS  PubMed  Google Scholar 

  3. Angly, E.A., Felts, B., Salamon, P., Edwards, E.A., Carlson, C., et al.: The Marine Viromes of Four Oceanic Regions. PLoS Biol. 4(11) (2006)

    Google Scholar 

  4. Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E.E., Ram, R.J., et al.: Community Structure and Metabolism through Reconstruction of Microbial Genomes from the Environment. Nature 428(6978), 37–43 (2004)

    Article  CAS  PubMed  Google Scholar 

  5. Gill, S.R., Pop, M., Deboy, R.T., Eckburg, P.B., Turnbaugh, P.J., et al.: Metagenomic Analysis of the Human Distal Gut Microbiome. Science 312(5778), 1355–1359 (2006)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. DeLong, E.F., Preston, C.M., Mincer, T., Rich, V., Hallam, S.J., et al.: Community Genomics among Stratified Microbial Assemblages in the Ocean’s Interior. Science 311(5760), 496–503 (2006)

    Article  CAS  PubMed  Google Scholar 

  7. Markowitz, V.M., Szeto, E., Palaniappan, K., Grechkin, Y., Chu, K., et al.: The Integrated Microbial Genomes (IMG) System in 2007: Data Content and Analysis Tool Extensions. Nucleic Acids Res. 36(Database Issue), DS528–DS533 (2008)

    Google Scholar 

  8. Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Kiryutin, B., et al.: The COG Database: an Updated Version Includes Eukaryotes. BMC Bioinformatics 4, 41 (2003)

    Article  PubMed  PubMed Central  Google Scholar 

  9. Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, J.S., et al.: The Pfam Protein Families Database. Nucleic Acids Res. 36(Database Issue), D281–D288 (2008)

    Google Scholar 

  10. Haft, D.H., Selengut, J.D., White, O.: The TIGRFAMs Database of Protein Families. Nucleic Acids Res. 31, 371–373 (2003)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Rodriguez-Brito, B., Rohwer, F., Edwards, R.A.: An Application of Statistics to Comparative Metagenomics. BMC Bioinformatics 20(7), 162 (2006)

    Article  Google Scholar 

  12. Tringe, S.G., von Mering, C., Kobayashi, A., Salamov, A.A., Chen, K., et al.: Comparative Metagenomics of Microbial Communities. Science 308(5721), 554–557 (2005)

    Article  CAS  PubMed  Google Scholar 

  13. Rusch, D.B., Halpern, A.L., Sutton, G., Heidelberg, K.B., Williamson, S., et al.: The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biol. 5(3), e77 (2007)

    Article  Google Scholar 

  14. Yooseph, S., Sutton, G., Rusch, D.B., Halpern, A.L., Williamson, S.J., et al.: The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families. PLoS Biol. 5(3), e16 (2007)

    Article  Google Scholar 

  15. Overbeek, R., Begley, T., Butler, R.M., Choudhuri, J.V., Chuang, H.Y., et al.: The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes. Nucleic Acids Res. 33, 5691–5702 (2005)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Lander, E.S., Waterman, M.S.: Genomic Mapping by Fingerprinting Random Clones: a Mathematical Analysis. Genomics 2(3), 231–239 (1988)

    Article  CAS  PubMed  Google Scholar 

  17. Schloss, P.D., Handelssman, J.: A Statistical Toolbox for Metagenomics: Assessing Functional Diversity in Microbial Communities. BMC Bioinformatics 9(34) (2008)

    Google Scholar 

  18. Sorek, R., Zhu, Y., Creevey, C., Francino, M.P., Bork, P., Rubin, E.M.: Genome-wide Experimental Determination of Barriers to Horizontal Gene Transfer. Science 318(5855), 1449–1452 (2007)

    Article  CAS  PubMed  Google Scholar 

  19. Mavromatis, K., Ivanova, N., Barry, K., Shapiro, H., Goltsman, E., et al.: Use of Simulated Data Sets to Evaluate the Fidelity of Metagenomic Processing Methods. Nature Methods 4, 495–500 (2007)

    Article  CAS  PubMed  Google Scholar 

  20. Sanger, F., Coulson, A.R., Hong, G.F., Hill, D.F., Petersen, G.B.: Nucleotide Sequence of Bacteriophage Lambda DNA. J. Mol. Biol. 162, 4 (1982)

    Article  Google Scholar 

  21. Fleischmann, R.D., Adams, M.D., White, O., Clayton, R.A., Kirkness, E.F., et al.: Whole-genome Random Sequencing and Assembly of Haemophilus influenzae Rd. Science 269(5223), 496–512 (1995)

    Article  CAS  PubMed  Google Scholar 

  22. Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., et al.: The Sequence of the Human Genome. Science 291(5507), 1304–1351 (2001)

    Article  CAS  PubMed  Google Scholar 

  23. Kanehisa, M., Goto, S.: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215, 403–410 (1990)

    Article  CAS  PubMed  Google Scholar 

  25. Martín-Cuadrado, A.B., López-García, P., Gottschalk, G., Rodríguez-Valera, F.: Metagenomics of the Deep Mediterranean, a Warm Bathypelagic Habitat. PLoS ONE 2, 914 (2007)

    Article  Google Scholar 

  26. Warnecke, F., Luginbuhl, P., Ivanova, N., Ghassemian, M., Richardson, T.H., et al.: Metagenomic and Functional Analysis of Hindgut Microbiota of a Wood Feeding Higher Termite. Nature 450, 560–565 (2007)

    Article  CAS  PubMed  Google Scholar 

  27. Marchler-Bauer, A., Anderson, J.B., Chitsaz, F., Derbyshire, M.K., DeWeese-Scott, C., et al.: Specific Functional Annotation with the Conserved Domain Database. Nucleic Acids Res. 37(Database Issue), D205–D210

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sharon, I., Pati, A., Markowitz, V.M., Pinter, R.Y. (2009). A Statistical Framework for the Functional Analysis of Metagenomes. In: Batzoglou, S. (eds) Research in Computational Molecular Biology. RECOMB 2009. Lecture Notes in Computer Science(), vol 5541. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02008-7_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02008-7_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02007-0

  • Online ISBN: 978-3-642-02008-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics