Skip to main content

Genomic Databases and Resources at the National Center for Biotechnology Information

  • Protocol
  • First Online:
Data Mining Techniques for the Life Sciences

Part of the book series: Methods in Molecular Biology ((MIMB,volume 609))

Abstract

The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI Web site. Entrez, a text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.

Customized genomic BLAST enables sequence similarity searches against a special collection of organism-specific sequence data and viewing the resulting alignments within a genomic context using NCBI’s genome browser, Map Viewer.

Comparative genome analysis tools lead to further understanding of evolutionary processes, quickening the pace of discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Liolios, K., Mavrommatis, K., Tavernarakis, N., Kyrpides, N. C. (2007) The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 36(Database issue), D475–D479.

    Article  PubMed  Google Scholar 

  2. Cochrane, G., Akhtar, R., Aldebert, P., Althorpe, N., Baldwin, A., Bates, K., Bhattacharyya, S., Bonfield, J., Bower, L., Browne, P., Castro, M., Cox, T., Demiralp, F., Eberhardt, R., Faruque, N., Hoad, G., Jang, M., Kulikova, T., Labarga, A., Leinonen, R., Leonard, S., Lin, Q., Lopez, R., Lorenc, D., McWilliam, H., Mukherjee, G., Nardone, F., Plaister, S., Robinson, S., Sobhany, S., Vaughan, R., Wu, D., Zhu, W., Apweiler, R., Hubbard, T., Birney, E. (2008) Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database. Nucleic Acids Res 36(Database issue), D5–D12.

    CAS  PubMed  Google Scholar 

  3. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Wheeler, D. L. (2008) GenBank. Nucleic Acids Res 36(Database issue), D25–D30.

    CAS  PubMed  Google Scholar 

  4. Sugawara, H., Ogasawara, O., Okubo, K., Gojobori, T., Tateno, Y. (2008) DDBJ with new system and face. Nucleic Acids Res 36(Database issue), D22–D24.

    CAS  PubMed  Google Scholar 

  5. Galperin, M. Y. (2008) The molecular biology database collection: 2008 update. Nucleic Acids Res 36(Database issue), D2–D4.

    CAS  PubMed  Google Scholar 

  6. Wheeler, D. L., et al. (2008) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 36(Database issue), D13–D21.

    CAS  PubMed  Google Scholar 

  7. Mailman, M. D., Feolo, M., Jin, Y., Kimura, M., Tryka, K., Bagoutdinov, R., Hao, L., Kiang, A., Paschall, J., Phan, L., Popova, N., Pretel, S., Ziyabari, L., Lee, M., Shao, Y., Wang, Z. Y., Sirotkin, K., Ward, M., Kholodov, M., Zbicz, K., Beck, J., Kimelman, M., Shevelev, S., Preuss, D., Yaschenko, E., Graeff, A., Ostell, J., Sherry, S. T. (2007) The NCBI dbGaP database of genotypes and phenotypes. Nat Genet 39(10), 1181–1186.

    Article  CAS  PubMed  Google Scholar 

  8. Pruitt, K. D., Tatusova, T., Maglott, D. R. (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35(Database issue), D61–D65.

    Article  CAS  PubMed  Google Scholar 

  9. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17), 3389–3402. Review.

    Article  CAS  PubMed  Google Scholar 

  10. Maglott, D. R., Ostell, J., Pruitt, K. D., Tatusova, T. (2007) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 35(Database issue), D26–D31.

    Article  CAS  PubMed  Google Scholar 

  11. Hillary, E. S., Maria, A. S., eds. (2006) Genomes (Cold Spring Harbor Monograph Series, 46). Cold Spring Harbor, New York.

    Google Scholar 

  12. Salzberg, S. L., Church, D., DiCuccio, M., Yaschenko, E., Ostell, J. (2004) The genome Assembly Archive: a new public resource. PLoS Biol. 2(9), E285.

    Article  PubMed  Google Scholar 

  13. Tatusova, T. A., Karsch-Mizrachi, I., Ostell, J. A. (1999) Complete genomes in WWW Entrez: data representation and analysis. Bioinformatics 15(7–8), 536–543.

    Article  CAS  PubMed  Google Scholar 

  14. Fleischmann, R. D., et al. Whole-genome random sequencing and assembly of Haemophilus influenza Rd. (1995) Science 269(5223), 496–512.

    Article  CAS  PubMed  Google Scholar 

  15. Tatusov, R. L., Fedorova, N. D., Jackson, J. D., Jacobs, A. R., Kiryutin, B., Koonin, E. V., Krylov, D. M., Mazumder, R., Mekhedov, S. L., Nikolskaya, A. N., Rao, B. S., Smirnov, S., Sverdlov, A. V., Vasudevan, S., Wolf, Y. I., Yin, J. J., Natale, D. A. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41.

    Article  PubMed  Google Scholar 

  16. Klimke, W., Tatusova, T. (2006) Microbial genomes at NCBI in (Mulder, N., Apweiler, R., eds.) In Silico Genomics And Proteomics: Functional Annotation of Genomes And Proteins, Nova Science Publishers; 1st ed., pp. 157–183.

    Google Scholar 

  17. Tatusova, T., Smith-White, B., Ostell, J. A. (2006) Collection of plant-specific genomic data and resources at the National Center for Biotechnology Information, in (David, E., ed.), Plant Bioinformatics: Methods and Protocols (Methods in Molecular Biology), Humana Press, 1st ed., pp. 61–87.

    Google Scholar 

  18. Nakabachi, A., Yamashita, A., Toh, H., Ishikawa, H., Dunbar, H. E., Moran, N. A., Hattori, M. (2006) The 160-kilobase genome of the bacterial endosymbiont. Carsonella Sci 314(5797), 267.

    CAS  Google Scholar 

  19. Schneiker, S., et al. (2007) Complete genome sequence of the myxobacterium Sorangium cellulosum. Nat Biotechnol 25(11), 1281–1289.

    Article  CAS  PubMed  Google Scholar 

  20. Brügger, K., et al. (2007) The genome of Hyperthermus butylicus: a sulfur-reducing, peptide fermenting, neutrophilic Crenarchaeote growing up to 108 degrees C. Archaea 2(2), 127–135.

    Article  PubMed  Google Scholar 

  21. Teeling, H., Lombardot, T., Bauer, M., Ludwig, W., Glockner, F. O. (2004) Evaluation of the phylogenetic position of the planctomycete ‘Rhodopirellula baltica’ SH 1 by means of concatenated ribosomal protein sequences, DNA-directed RNA polymerase subunit sequences and whole genome trees. Int J Syst Evol Microbiol 54, 791–801.

    Article  CAS  PubMed  Google Scholar 

  22. Darling, A. C., Mau, B., Blattner, F. R., et al. (2004). Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14(7), 1394–1403.

    Article  CAS  PubMed  Google Scholar 

  23. Ahn, S. N., Tanksley, S. D. (1993) Comparative linkage maps of the rice and maize genomes. Proc Natl Acad Sci USA 90, 7980–7984.

    Article  CAS  PubMed  Google Scholar 

  24. Devos, K. M., Chao, S., Li, Q. Y., Simonetti, M. C., Gale, M. D. (1994) Relationship between chromosome 9 of maize and wheat homeologous group 7 chromosomes. Genetics 138, 1287–1292.

    CAS  PubMed  Google Scholar 

  25. Kurata, N., Moore, G., Nagamura, Y., Foote, T., Yano, M., Minobe, Y., Gale, M. D. (1994) Conservation of genome structure between rice and wheat. Biotechnology (NY) 12, 276–278.

    Article  CAS  Google Scholar 

  26. van Deynze, A. E., Nelson, J. C., O’Donoghue, L. S., Ahn, S. N., Siripoonwiwat, W., Harrington, S. E., Yglesias, E. S., Braga, D. P., McCouch, S. R., Sorrells, M. E. (1995) Comparative mapping in grasses: oat relationships. Mol Gen Genet 249, 349–356.

    Article  PubMed  Google Scholar 

  27. Lederburg, E. M. (1986) Plasmid prefix designations registered by the Plasmid Reference Center 1977–1985. Plasmid 1, 57–92.

    Article  Google Scholar 

  28. Altschul, S. F., Gish, W., Miller, W., et al. (1990). Basic local alignment search tool. J Mol Biol 215(3), 403–410.

    CAS  PubMed  Google Scholar 

  29. Cummings, L., Riley, L., Black, L., Souvorov, A., Resenchuk, S., Dondoshansky, I., Tatusova, T. (2002) Genomic BLAST: custom-defined virtual databases for complete and unfinished genomes. FEMS Microbiol Lett 216(2), 133–138.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

The authors would like to thank, in alphabetic order, Vyacheslav Chetvernin, Boris Fedorov, Andrei Kochergin, Peter Meric and Sergei Resenchuk, and Martin Shumway for their expertise and diligence in the design and maintenance of the databases highlighted in this publication and Stacy Ciufo for the helpful discussion and comments. These projects represent the efforts of many NCBI staff members along with the collective contributions of many dedicated scientists worldwide.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Tatusova, T. (2010). Genomic Databases and Resources at the National Center for Biotechnology Information. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 609. Humana Press. https://doi.org/10.1007/978-1-60327-241-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-241-4_2

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-60327-240-7

  • Online ISBN: 978-1-60327-241-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics