Advertisement

From Gene Annotation to Function Prediction for Metagenomics

Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1611)

Abstract

Microbes play important roles in almost every aspect of life, including human health and diseases. Facilitated by the rapid development of sequencing technologies, metagenomics research has accelerated the accumulation of genomic sequences of microbial species that had been inaccessible before. Analysis of the metagenomic sequencing data can reveal not only the species but also the functional composition of microbial communities. Here, we report a pipeline for functional annotation of metagenomic datasets. The pipeline is built from several programs that we have developed for metagenomic sequence analysis including a protein-coding gene predictor for short reads (or contigs) and a fast similarity search tool. Given a metagenomic dataset, the pipeline reports putative protein-coding genes (or gene fragments) and functional annotations of the genes in Gene Ontology (GO) terms and Enzyme Commission (EC) numbers, and potential metabolic pathways that are likely encoded by the metagenome. Fun4Me is available for download at https://sourceforge.net/projects/fun4me.

Keywords

Metagenomics Similarity search Function prediction Gene Ontology (GO) Metabolic pathway 

Notes

Acknowledgments

This work was supported by NIH grant 1R01AI108888 and NSF grant DBI-0845685.

References

  1. 1.
    Wooley JC, Ye Y (2009) Metagenomics: facts and artifacts, and computational challenges. J Comput Sci Technol 25(1):71–81CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Gene Ontology C (2015) Gene Ontology Consortium: going forward. Nucleic Acids Res 43(Database issue):D1049–D1056CrossRefGoogle Scholar
  3. 3.
    Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res 38(20):e191CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Ye Y, Choi JH, Tang H (2011) RAPSearch: a fast protein similarity search tool for short reads. BMC Bioinformatics 12:159CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Zhao Y, Tang H, Ye Y (2012) RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics 28(1):125–126CrossRefPubMedGoogle Scholar
  6. 6.
    Somervuo P, Holm L (2015) SANSparallel: interactive homology search against Uniprot. Nucleic Acids Res 43(W1):W24–W29CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Buchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12(1):59–60CrossRefPubMedGoogle Scholar
  9. 9.
    Ye Y, Doak TG (2009) A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes. PLoS Comput Biol 5(8):e1000465CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R et al (2008) The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:386CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Silva GG, Green KT, Dutilh BE, Edwards RA (2016) SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data. Bioinformatics 32(3):354–361CrossRefPubMedGoogle Scholar
  12. 12.
    Bertsimas D, Tsitsiklis JN (1997) Introduction to linear optimization. In: Athena Scientific series in optimization and neural computation. Athena Scientific, Belmont, MA. xv, 587 pGoogle Scholar
  13. 13.
    The Human Microbiome Project Consortium (2012) Structure, function and diversity of the healthy human microbiome. Nature 486(7402):207–214CrossRefPubMedCentralGoogle Scholar
  14. 14.
    Li D, Luo R, Liu CM, Leung CM, Ting HF, Sadakane K, Yamashita H, Lam TW (2016) MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102:3–11. doi: 10.1016/j.ymeth.2016.02.020 CrossRefPubMedGoogle Scholar
  15. 15.
    Lavezzo E, Falda M, Fontana P, Bianco L, Toppo S (2016) Enhancing protein function prediction with taxonomic constraints—the Argot2.5 web server. Methods 93:15–23CrossRefPubMedGoogle Scholar
  16. 16.
    Caspi R, Billington R, Ferrer L, Foerster H, Fulcher CA, Keseler IM, Kothari A, Krummenacker M, Latendresse M et al (2016) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 44(D1):D471–D480CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.School of Informatics and ComputingIndiana UniversityBloomingtonUSA

Personalised recommendations