Advertisement

Assembly, Annotation, and Comparative Genomics in PATRIC, the All Bacterial Bioinformatics Resource Center

  • Alice R. WattamEmail author
  • Thomas Brettin
  • James J. Davis
  • Svetlana Gerdes
  • Ronald Kenyon
  • Dustin Machi
  • Chunhong Mao
  • Robert Olson
  • Ross Overbeek
  • Gordon D. Pusch
  • Maulik P. Shukla
  • Rick Stevens
  • Veronika Vonstein
  • Andrew Warren
  • Fangfang Xia
  • Hyunseung Yoo
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1704)

Abstract

In the “big data” era, research biologists are faced with analyzing new types that usually require some level of computational expertise. A number of programs and pipelines exist, but acquiring the expertise to run them, and then understanding the output can be a challenge.

The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org) has created an end-to-end analysis platform that allows researchers to take their raw reads, assemble a genome, annotate it, and then use a suite of user-friendly tools to compare it to any public data that is available in the repository. With close to 113,000 bacterial and more than 1000 archaeal genomes, PATRIC creates a unique research experience with “virtual integration” of private and public data. PATRIC contains many diverse tools and functionalities to explore both genome-scale and gene expression data, but the main focus of this chapter is on assembly, annotation, and the downstream comparative analysis functionality that is freely available in the resource.

Key words

Assembly Annotation Comparative genomics Bacteria Archaea Bioinformatics 

References

  1. 1.
    Wattam AR et al (2013) PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res 42(Database issue):D581–D591. gkt1099PubMedPubMedCentralGoogle Scholar
  2. 2.
    Aziz RK et al (2008) The RAST server: rapid annotations using subsystems technology. BMC Genomics 9(1):75CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Brettin T et al (2015) RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep 5:8365CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Branton D et al (2008) The potential and challenges of nanopore sequencing. Nat Biotechnol 26(10):1146–1153CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Nikolenko SI, Korobeynikov AI, Alekseyev MA (2013) BayesHammer: Bayesian clustering for error correction in single-cell sequencing. BMC Genomics 14(1):1CrossRefGoogle Scholar
  6. 6.
    Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18(5):821–829CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Peng Y et al (2010) IDBA–a practical iterative de Bruijn graph de novo assembler. In: Research in computational molecular biology. Springer, BerlinGoogle Scholar
  8. 8.
    Bankevich A et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19(5):455–477CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Li D et al (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31(10):1674–1676. btv033CrossRefPubMedGoogle Scholar
  10. 10.
    Namiki T et al (2012) MetaVelvet: an extension of velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res 40(20):e155–e155CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Clark SC et al (2013) ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies. Bioinformatics 29(4):435–443. bts723CrossRefPubMedGoogle Scholar
  12. 12.
    Vicedomini R et al (2013) GAM-NGS: genomic assemblies merger for next generation sequencing. BMC Bioinformatics 14(7):1Google Scholar
  13. 13.
    Li H (2015) Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32:2103–2110. arXiv preprint arXiv:1512.01801CrossRefGoogle Scholar
  14. 14.
    Krzywinski M et al (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19(9):1639–1645CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Chen L et al (2005) VFDB: a reference database for bacterial virulence factors. Nucleic Acids Res 33(suppl 1):D325–D328PubMedGoogle Scholar
  16. 16.
    Chen L et al (2016) VFDB 2016: hierarchical and refined dataset for big data analysis-10 years on. Nucleic Acids Res 44(D1):D694–D697CrossRefPubMedGoogle Scholar
  17. 17.
    Mao C et al (2015) Curation, integration and visualization of bacterial virulence factors in PATRIC. Bioinformatics 31(2):252–258CrossRefPubMedGoogle Scholar
  18. 18.
    Xiang Z, Tian Y, He Y (2007) PHIDIAS: a pathogen-host interaction data integration and analysis system. Genome Biol 8(7):R150CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Liu B, Pop M (2009) ARDB—antibiotic resistance genes database. Nucleic Acids Res 37(suppl 1):D443–D447CrossRefPubMedGoogle Scholar
  20. 20.
    McArthur AG et al (2013) The comprehensive antibiotic resistance database. Antimicrob Agents Chemother 57(7):3348–3357CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Kanehisa M et al (2008) KEGG for linking genomes to life and the environment. Nucleic Acids Res 36(suppl 1):D480–D484PubMedGoogle Scholar
  22. 22.
    Kanehisa M et al (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res 32(suppl 1):D277–D280CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Davis JJ et al (2016) PATtyFams: protein families for the microbial genomes in the PATRIC database. Front Microbiol 7:118CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30(7):1575–1584CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    van Dongen SM (2001) Graph clustering by flow simulation. University of Utrecht, UtrechtGoogle Scholar
  26. 26.
    Overbeek R et al (2014) The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST). Nucleic Acids Res 42(D1):D206–D214CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2018

Authors and Affiliations

  • Alice R. Wattam
    • 1
    Email author
  • Thomas Brettin
    • 2
    • 3
  • James J. Davis
    • 2
    • 3
  • Svetlana Gerdes
    • 3
    • 4
  • Ronald Kenyon
    • 1
  • Dustin Machi
    • 1
  • Chunhong Mao
    • 1
  • Robert Olson
    • 2
    • 5
  • Ross Overbeek
    • 3
    • 4
  • Gordon D. Pusch
    • 3
    • 4
  • Maulik P. Shukla
    • 2
    • 3
  • Rick Stevens
    • 2
    • 3
    • 6
  • Veronika Vonstein
    • 3
    • 4
  • Andrew Warren
    • 1
  • Fangfang Xia
    • 2
    • 5
  • Hyunseung Yoo
    • 2
    • 3
  1. 1.Biocomplexity InstituteVirginia TechBlacksburgUSA
  2. 2.Computation InstituteUniversity of ChicagoChicagoUSA
  3. 3.Computing, Environment and Life SciencesArgonne National LaboratoryArgonneUSA
  4. 4.Fellowship for Interpretation of GenomesBurr RidgeUSA
  5. 5.Mathematics and Computer Science DivisionArgonne National LaboratoryArgonneUSA
  6. 6.Department of Computer ScienceUniversity of ChicagoChicagoUSA

Personalised recommendations