Skip to main content

Update on Genomic Databases and Resources at the National Center for Biotechnology Information

  • Protocol
  • First Online:
Data Mining Techniques for the Life Sciences

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1415))

Abstract

The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI website, text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.

Comparative genome analysis tools lead to further understanding of evolution processes quickening the pace of discovery. Recent technological innovations have ignited an explosion in genome sequencing that has fundamentally changed our understanding of the biology of living organisms. This huge increase in DNA sequence data presents new challenges for the information management system and the visualization tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Matsen FA (2015) Phylogenetics and the human microbiome. Syst Biol 64(1):e26–e41, Review

    Article  PubMed  PubMed Central  Google Scholar 

  2. Hedlund BP, Dodsworth JA, Murugapiran SK, Rinke C, Woyke T (2014) Impact of single-cell genomics and metagenomics on the emerging view of extremophile "microbial dark matter". Extremophiles 18(5):865–875, Review

    Article  CAS  PubMed  Google Scholar 

  3. Vernikos G, Medini D, Riley DR, Tettelin H (2015) Ten years of pan-genome analyses. Curr Opin Microbiol 23:148–154, Review

    Article  CAS  PubMed  Google Scholar 

  4. Henson J, Tischler G, Ning Z (2012) Next-generation sequencing and large genome assemblies. Pharmacogenomics 13(8):901–915

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Wang Y, Navin NE (2015) Advances and applications of single-cell sequencing technologies. Mol Cell 58(4):598–609

    Article  CAS  PubMed  Google Scholar 

  6. Feng Y, Zhang Y, Ying C, Wang D, Du C (2015) Nanopore-based fourth-generation DNA sequencing technology. Genomics Proteomics Bioinformatics 13(1):4–16

    Article  PubMed  PubMed Central  Google Scholar 

  7. Wu AR, Neff NF, Kalisky T, Dalerba P, Treutlein B, Rothenberg ME, Mburu FM, Mantalas GL, Sim S, Clarke MF, Quake SR (2014) Quantitative assessment of single-cell RNA-sequencing methods. Nat Methods 11(1):41–46

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Berlin K, Koren S, Chin CS, Drake JP, Landolin JM, Phillippy AM (2015) Assembling large genomes with single-molecule sequencing and locality sensitive hashing. Nat Biotechnol 33(6):623–630

    Article  CAS  PubMed  Google Scholar 

  9. Madoui MA, Engelen S, Cruaud C, Belser C, Bertrand L, Alberti A, Lemainque A, Wincker P, Aury JM (2015) Genome assembly using Nanopore-guided long and error-free DNA reads. BMC Genomics 16(1):327

    Article  PubMed  PubMed Central  Google Scholar 

  10. Koren S, Phillippy AM (2015) One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr Opin Microbiol 23:110–120

    Article  CAS  PubMed  Google Scholar 

  11. Silvester N, Alako B, Amid C, Cerdeño-Tárraga A et al (2015) Content discovery and retrieval services at the European Nucleotide Archive. Nucleic Acids Res 43(Database issue): D23–D29

    Article  PubMed  PubMed Central  Google Scholar 

  12. Benson DA, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2015) GenBank. Nucleic Acids Res 43(Database issue):D30–D35

    Article  PubMed  PubMed Central  Google Scholar 

  13. Kodama Y, Mashima J, Kosuge T, Katayama T, Fujisawa T, Kaminuma E, Ogasawara O, Okubo K, Takagi T, Nakamura Y (2015) The DDBJ Japanese Genotype-phenotype Archive for genetic and phenotypic human data. Nucleic Acids Res 43(Database issue):D18–D22

    Article  PubMed  PubMed Central  Google Scholar 

  14. Barrett T, Clark K, Gevorgyan R, Gorelenkov V, Gribov E, Karsch-Mizrachi I, Kimelman M, Pruitt KD, Resenchuk S, Tatusova T, Yaschenko E, Ostell J (2012) BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata. Nucleic Acids Res 40(Database issue):D57–D63

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Kodama Y, Shumway M, Leinonen R, International Nucleotide Sequence Database Collaboration (2012) The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res 40(Database issue): D54–D56

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Pruitt KD, Brown GR, Hiatt SM et al (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids Res 42(Database issue):D756–D763

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Tatusova T, Ciufo S, Federhen S, Fedorov B, McVeigh R, O'Neill K, Tolstoy I, Zaslavsky L (2015) Update on RefSeq microbial genome resources. Nucleic Acids Res 43(Database issue):D599–D605

    Article  PubMed  PubMed Central  Google Scholar 

  18. NCBI Resource Coordinators (2015) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 43(Database issue):D6–D17

    Article  PubMed Central  Google Scholar 

  19. Salzberg SL, Church D, DiCuccio M, Yaschenko E, Ostell J (2004) The genome Assembly Archive: a new public resource. PLoS Biol 2(9), E285

    Article  PubMed  PubMed Central  Google Scholar 

  20. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402, Review

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Federhen S (2012) The NCBI Taxonomy database. Nucleic Acids Res 40:D13–D25

    Article  PubMed  PubMed Central  Google Scholar 

  22. Gurevich A, Saveliev V, Vyahhi N, Tesler G (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics 29(8):1072–1075

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Rahman A, Pachter L (2013) CGAL: computing genome assembly likelihoods. Genome Biol 14(1):R8

    Article  PubMed  PubMed Central  Google Scholar 

  24. Blattner FR, Plunkett G 3rd, Bloch CA et al (1997) The complete genome sequence of Escherichia coli K-12. Science 277(5331):1453–1462

    Article  CAS  PubMed  Google Scholar 

  25. Riley M, Abe T, Arnaud MB, Berlyn MK et al (2006) Escherichia coli K-12: a cooperatively developed annotation snapshot--2005. Nucleic Acids Res 34(1):1–9

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors would like to thank, in alphabetic order, Boris Fedorov and Sergei Resenchuk for their expertise and diligence in the design and maintenance of the databases highlighted in this publication and Stacy Ciufo for the helpful discussion and comments. These projects represent the efforts of many NCBI staff members along with the collective contributions of many dedicated scientists worldwide.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tatiana Tatusova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Tatusova, T. (2016). Update on Genomic Databases and Resources at the National Center for Biotechnology Information. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 1415. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3572-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3572-7_1

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3570-3

  • Online ISBN: 978-1-4939-3572-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics