Skip to main content

Genomic Database Searching

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1525))

Abstract

The availability of reference genome sequences for virtually all species under active research has revolutionized biology. Analyses of genomic variations in many organisms have provided insights into phenotypic traits, evolution and disease, and are transforming medicine. All genomic data from publicly funded projects are freely available in Internet-based databases, for download or searching via genome browsers such as Ensembl, Vega, NCBI’s Map Viewer, and the UCSC Genome Browser. These online tools generate interactive graphical outputs of relevant chromosomal regions, showing genes, transcripts, and other genomic landmarks, and epigenetic features mapped by projects such as ENCODE.

This chapter provides a broad overview of the major genomic databases and browsers, and describes various approaches and the latest resources for searching them. Methods are provided for identifying genomic locus and sequence information using gene names or codes, identifiers for DNA and RNA molecules and proteins; also from karyotype bands, chromosomal coordinates, sequences, motifs, and matrix-based patterns. Approaches are also described for batch retrieval of genomic information, performing more complex queries, and analyzing larger sets of experimental data, for example from next-generation sequencing projects.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

Abbreviations

API:

Application Programming Interface

BED:

Browser Extensible Data

BLAST:

Basic Local Alignment Search Tool

BLAT:

BLAST-Like Alignment Tool

DDBJ:

DNA Databank of Japan

EBI:

European Bioinformatics Institute

EMBOSS:

European Molecular Biology Open Software Suite

ENA:

European Nucleotide Archive

ENCODE:

Encyclopedia Of DNA Elements

FTP:

File Transfer Protocol

GI:

GenInfo Identifier

GOLD:

Genomes Online Database

GRC:

Genome Reference Consortium

GUI:

Graphical User Interface

HAVANA:

Human and Vertebrate Analysis and Annotation

ID:

Identifier Code

INSDC:

International Nucleotide Sequence Database Collaboration

NCBI:

National Center for Biotechnology Information

NGS:

Next-Generation Sequencing

PWM:

Position Weight Matrix

RegEx:

Regular Expression

REST:

Representational State Transfer

ROI:

Region of Interest

RSAT:

Regulatory Sequence Analysis Tools

UCSC:

University of California Santa Cruz

URL:

Uniform Resource Locator

Vega:

Vertebrate Genome Annotation

References

  1. Sanger F, Air GM, Barrell BG et al (1977) Nucleotide sequence of bacteriophage phi X174 DNA. Nature 265:687–695

    Article  CAS  PubMed  Google Scholar 

  2. Fleischmann RD, Adams MD, White O et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512

    Article  CAS  PubMed  Google Scholar 

  3. Johnston M (1996) The complete code for a eukaryotic cell. Genome sequencing. Curr Biol 6:500–503

    Article  CAS  PubMed  Google Scholar 

  4. C. elegans Sequencing Consortium (1998) Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282:2012–2018

    Article  Google Scholar 

  5. Lander ES, Linton LM, Birren B et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921

    Article  CAS  PubMed  Google Scholar 

  6. Venter JC, Adams MD, Myers EW et al (2001) The sequence of the human genome. Science 291:1304–1351

    Article  CAS  PubMed  Google Scholar 

  7. IHGSC (2004) Finishing the euchromatic sequence of the human genome. Nature 431:931–945

    Article  CAS  Google Scholar 

  8. Reddy TB, Thomas AD, Stamatis D et al (2015) The Genomes OnLine Database (GOLD) v. 5: a metadata management system based on a four level (meta)genome project classification. Nucleic Acids Res 43:D1099–D1106

    Article  CAS  PubMed  Google Scholar 

  9. Warren WC, Hillier LW, Marshall Graves JA et al (2008) Genome analysis of the platypus reveals unique signatures of evolution. Nature 453:175–183

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Amemiya CT, Alfoldi J, Lee AP et al (2013) The African coelacanth genome provides insights into tetrapod evolution. Nature 496:311–316

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Prüfer K, Racimo F, Patterson N et al (2014) The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505:43–49

    Article  PubMed  CAS  Google Scholar 

  12. King TE, Fortes GG, Balaresque P et al (2014) Identification of the remains of King Richard III. Nat Commun 5:5631

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Abecasis GR, Altshuler D, Auton A et al (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073

    Article  PubMed  CAS  Google Scholar 

  14. Abecasis GR, Auton A, Brooks LD et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65

    Article  PubMed  CAS  Google Scholar 

  15. Torjesen I (2013) Genomes of 100,000 people will be sequenced to create an open access research resource. BMJ 347:f6690

    Article  PubMed  Google Scholar 

  16. Baslan T, Hicks J (2014) Single cell sequencing approaches for complex biological systems. Curr Opin Genet Dev 26C:59–65

    Article  CAS  Google Scholar 

  17. Liang J, Cai W, Sun Z (2014) Single-cell sequencing technologies: current and future. J Genet Genomics = Yi Chuan Xue Bao 41:513–528

    Article  PubMed  Google Scholar 

  18. Dykes CW (1996) Genes, disease and medicine. Br J Clin Pharmacol 42:683–695

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Chan IS, Ginsburg GS (2011) Personalized medicine: progress and promise. Annu Rev Genomics Hum Genet 12:217–244

    Article  CAS  PubMed  Google Scholar 

  20. Bauer DC, Gaff C, Dinger ME et al (2014) Genomics and personalised whole-of-life healthcare. Trends Mol Med 20(9):479–486

    Article  PubMed  Google Scholar 

  21. Check Hayden E (2010) Human genome at ten: life is complicated. Nature 464:664–667

    Article  PubMed  CAS  Google Scholar 

  22. Dulbecco R (1986) A turning point in cancer research: sequencing the human genome. Science 231:1055–1056

    Article  CAS  PubMed  Google Scholar 

  23. International Cancer Genome Consortium, Hudson TJ, Anderson W et al (2010) International network of cancer genome projects. Nature 464, 993–998

    Google Scholar 

  24. Alexandrov LB, Stratton MR (2014) Mutational signatures: the patterns of somatic mutations hidden in cancer genomes. Curr Opin Genet Dev 24C:52–60

    Article  CAS  Google Scholar 

  25. Hoffman MM, Ernst J, Wilder SP et al (2013) Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res 41:827–841

    Article  CAS  PubMed  Google Scholar 

  26. modEncode Consortium, Roy S, Ernst J et al (2010) Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330:1787–1797

    Article  CAS  Google Scholar 

  27. Gerstein MB, Lu ZJ, Van Nostrand EL et al (2010) Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science 330:1775–1787

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Harrow J, Frankish A, Gonzalez JM et al (2012) GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res 22:1760–1774

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Almouzni G, Altucci L, Amati B et al (2014) Relationship between genome and epigenome—challenges and requirements for future research. BMC Genomics 15:487

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Hériché JK (2014) Systematic cell phenotyping. In: Hancock JM (ed) Phenomics. CRC Press, Boca Raton, FL, pp 86–110

    Chapter  Google Scholar 

  31. Hutchins JRA (2014) What's that gene (or protein)? Online resources for exploring functions of genes, transcripts, and proteins. Mol Biol Cell 25:1187–1201

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  32. Schmidt A, Forne I, Imhof A (2014) Bioinformatic analysis of proteomics data. BMC Syst Biol 8(Suppl 2):S3

    Article  PubMed  PubMed Central  Google Scholar 

  33. Kaiser J (2005) Genomics. Celera to end subscriptions and give data to public GenBank. Science 308:775

    Article  PubMed  Google Scholar 

  34. Church DM, Schneider VA, Graves T et al (2011) Modernizing reference genome assemblies. PLoS Biol 9:e1001091

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Gerstein MB, Bruce C, Rozowsky JS et al (2007) What is a gene, post-ENCODE? History and updated definition. Genome Res 17:669–681

    Article  CAS  PubMed  Google Scholar 

  36. Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94

    Article  CAS  PubMed  Google Scholar 

  37. Thierry-Mieg D, Thierry-Mieg J (2006) AceView: a comprehensive cDNA-supported gene and transcripts annotation. Genome Biol 7(Suppl 1):S12.1–S12.14

    Article  Google Scholar 

  38. MGC Project Team, Temple G, Gerhard DS et al (2009) The completion of the Mammalian Gene Collection (MGC). Genome Res 19:2324–2333

    Article  CAS  Google Scholar 

  39. Farrell CM, O'Leary NA, Harte RA et al (2014) Current status and new features of the Consensus Coding Sequence database. Nucleic Acids Res 42:D865–D872

    Article  CAS  PubMed  Google Scholar 

  40. Cunningham F, Amode MR, Barrell D et al (2015) Ensembl 2015. Nucleic Acids Res 43:D662–D669

    Article  PubMed  Google Scholar 

  41. Pruitt KD, Brown GR, Hiatt SM et al (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids Res 42:D756–D763

    Article  CAS  PubMed  Google Scholar 

  42. Harrow JL, Steward CA, Frankish A et al (2014) The Vertebrate Genome Annotation browser 10 years on. Nucleic Acids Res 42:D771–D779

    Article  CAS  PubMed  Google Scholar 

  43. Frankish A, Uszczynska B, Ritchie GR et al (2015) Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction. BMC Genomics 16(Suppl 8):S2

    Article  PubMed  PubMed Central  Google Scholar 

  44. Kersey PJ, Allen JE, Christensen M et al (2014) Ensembl Genomes 2013: scaling up access to genome-wide data. Nucleic Acids Res 42:D546–D552

    Article  CAS  PubMed  Google Scholar 

  45. NCBI Resource Coordinators (2015) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 43:D6–D17

    Article  Google Scholar 

  46. Gray KA, Yates B, Seal RL et al (2015) Genenames.org: the HGNC resources in 2015. Nucleic Acids Res 43:D1079–D1085

    Article  PubMed  Google Scholar 

  47. dos Santos G, Schroeder AJ, Goodman JL et al (2015) FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations. Nucleic Acids Res 43:D690–D697

    Article  PubMed  Google Scholar 

  48. Silvester N, Alako B, Amid C et al (2015) Content discovery and retrieval services at the European Nucleotide Archive. Nucleic Acids Res 43:D23–D29

    Article  PubMed  Google Scholar 

  49. Kodama Y, Mashima J, Kosuge T et al (2015) The DDBJ Japanese Genotype-phenotype Archive for genetic and phenotypic human data. Nucleic Acids Res 43:D18–D22

    Article  PubMed  Google Scholar 

  50. UniProt Consortium (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:D204–D212

    Article  Google Scholar 

  51. Rosenbloom KR, Armstrong J, Barber GP et al (2015) The UCSC Genome Browser database: 2015 update. Nucleic Acids Res 43:D670–D681

    Article  PubMed  Google Scholar 

  52. Hsu F, Kent WJ, Clawson H et al (2006) The UCSC known genes. Bioinformatics 22:1036–1046

    Article  CAS  PubMed  Google Scholar 

  53. Nawrocki EP, Burge SW, Bateman A et al (2015) Rfam 12.0: updates to the RNA families database. Nucleic Acids Res 43:D130–D137

    Article  PubMed  Google Scholar 

  54. Chan PP, Lowe TM (2009) GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res 37:D93–D97

    Article  CAS  PubMed  Google Scholar 

  55. Punta M, Coggill PC, Eberhardt RY et al (2012) The Pfam protein families database. Nucleic Acids Res 40:D290–D301

    Article  CAS  PubMed  Google Scholar 

  56. Tatusova T (2010) Genomic databases and resources at the National Center for Biotechnology Information. Methods Mol Biol 609:17–44

    Article  CAS  PubMed  Google Scholar 

  57. Wolfsberg TG (2011) Using the NCBI Map Viewer to browse genomic sequence data. Curr Protoc Hum Genet. Chapter 18. Unit 18.15

    Google Scholar 

  58. Brown GR, Hem V, Katz KS et al (2015) Gene: a gene-centered information resource at NCBI. Nucleic Acids Res 43:D36–D42

    Article  PubMed  Google Scholar 

  59. Brister JR, Ako-Adjei D, Bao Y et al (2015) NCBI viral genomes resource. Nucleic Acids Res 43:D571–D577

    Article  PubMed  Google Scholar 

  60. Nicol JW, Helt GA, Blanchard SG Jr et al (2009) The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets. Bioinformatics 25:2730–2731

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Thorvaldsdottir H, Robinson JT, Mesirov JP (2013) Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 14:178–192

    Article  CAS  PubMed  Google Scholar 

  62. Fiume M, Smith EJ, Brook A et al (2012) Savant Genome Browser 2: visualization and analysis for population-scale genomics. Nucleic Acids Res 40:W615–W621

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Wright MW, Bruford EA (2011) Naming ‘junk’: human non-protein coding RNA (ncRNA) gene nomenclature. Hum Genomics 5:90–98

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Agirre E, Eyras E (2011) Databases and resources for human small non-coding RNAs. Hum Genomics 5:192–199

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. The RNAcentral Consortium (2015) RNAcentral: an international database of ncRNA sequences. Nucleic Acids Res 43:D123–D129

    Article  Google Scholar 

  66. Nakamura Y, Cochrane G, Karsch-Mizrachi I (2013) The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res 41:D21–D24

    Article  CAS  PubMed  Google Scholar 

  67. Ameres SL, Zamore PD (2013) Diversifying microRNA sequence and function. Nat Rev Mol Cell Biol 14:475–488

    Article  CAS  PubMed  Google Scholar 

  68. Kozomara A, Griffiths-Jones S (2014) miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res 42:D68–D73

    Article  CAS  PubMed  Google Scholar 

  69. Mani SR, Juliano CE (2013) Untangling the web: the diverse functions of the PIWI/piRNA pathway. Mol Reprod Dev 80:632–664

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Peng JC, Lin H (2013) Beyond transposons: the epigenetic and somatic functions of the Piwi-piRNA mechanism. Curr Opin Cell Biol 25:190–194

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Sai Lakshmi S, Agrawal S (2008) piRNABank: a web resource on classified and clustered Piwi-interacting RNAs. Nucleic Acids Res 36:D173–D177

    Article  CAS  PubMed  Google Scholar 

  72. Zhang P, Si X, Skogerbo G et al (2014) piRBase: a web resource assisting piRNA functional study. Database (Oxford) 2014, bau110

    Google Scholar 

  73. Sarkar A, Maji RK, Saha S et al (2014) piRNAQuest: searching the piRNAome for silencers. BMC Genomics 15:555

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  74. Skinner ME, Uzilov AV, Stein LD et al (2009) JBrowse: a next-generation genome browser. Genome Res 19:1630–1638

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Kung JT, Colognori D, Lee JT (2013) Long noncoding RNAs: past, present, and future. Genetics 193:651–669

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Bonasio R, Shiekhattar R (2014) Regulation of transcription by long noncoding RNAs. Annu Rev Genet 48:433–455

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Wright MW (2014) A short guide to long non-coding RNA gene nomenclature. Hum Genomics 8:7

    Article  PubMed  PubMed Central  Google Scholar 

  78. Fritah S, Niclou SP, Azuaje F (2014) Databases for lncRNAs: a comparative evaluation of emerging tools. RNA 20:1655–1665

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Quek XC, Thomson DW, Maag JL et al (2015) lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res 43:D168–D173

    Article  PubMed  Google Scholar 

  80. Craig JM, Bickmore WA (1993) Chromosome bands—flavours to savour. Bioessays 15:349–354

    Article  CAS  PubMed  Google Scholar 

  81. Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Article  CAS  PubMed  Google Scholar 

  82. Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12:656–664

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Jacox E, Elnitski L (2008) Finding occurrences of relevant functional elements in genomic signatures. Int J Comput Sci 2:599–606

    PubMed  PubMed Central  Google Scholar 

  84. Brennan RG, Matthews BW (1989) Structural basis of DNA-protein recognition. Trends Biochem Sci 14:286–290

    Article  CAS  PubMed  Google Scholar 

  85. Hudson WH, Ortlund EA (2014) The structure, function and evolution of proteins that bind DNA and RNA. Nat Rev Mol Cell Biol 15:749–760

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Wells RD (1988) Unusual DNA structures. J Biol Chem 263:1095–1098

    CAS  PubMed  Google Scholar 

  87. Hedgpeth J, Goodman HM, Boyer HW (1972) DNA nucleotide sequence restricted by the RI endonuclease. Proc Natl Acad Sci U S A 69:3448–3452

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Wei CL, Wu Q, Vega VB et al (2006) A global map of p53 transcription-factor binding sites in the human genome. Cell 124:207–219

    Article  CAS  PubMed  Google Scholar 

  89. Mergny JL (2012) Alternative DNA structures: G4 DNA in cells: itae missa est? Nat Chem Biol 8:225–226

    Article  CAS  PubMed  Google Scholar 

  90. Giraldo R, Suzuki M, Chapman L et al (1994) Promotion of parallel DNA quadruplexes by a yeast telomere binding protein: a circular dichroism study. Proc Natl Acad Sci U S A 91:7658–7662

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Cayrou C, Coulombe P, Puy A et al (2012) New insights into replication origin characteristics in metazoans. Cell Cycle 11:658–667

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Brown P, Baxter L, Hickman R et al (2013) MEME-LaB: motif analysis in clusters. Bioinformatics 29:1696–1697

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Grant CE, Bailey TL, Noble WS (2011) FIMO: scanning for occurrences of a given motif. Bioinformatics 27:1017–1018

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Medina-Rivera A, Defrance M, Sand O et al (2015) RSAT 2015: regulatory sequence analysis tools. Nucleic Acids Res 43:W50–W56

    Article  PubMed  PubMed Central  Google Scholar 

  95. Rice P, Longden I, Bleasby A (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16:276–277

    Article  CAS  PubMed  Google Scholar 

  96. Stormo GD, Zhao Y (2010) Determining the specificity of protein-DNA interactions. Nat Rev Genet 11:751–760

    CAS  PubMed  Google Scholar 

  97. Kel AE, Gossling E, Reuter I et al (2003) MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res 31:3576–3579

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Wingender E (2008) The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Brief Bioinform 9:326–332

    Article  CAS  PubMed  Google Scholar 

  99. Wrzodek C, Schroder A, Drager A et al (2010) ModuleMaster: a new tool to decipher transcriptional regulatory networks. Biosystems 99:79–81

    Article  CAS  PubMed  Google Scholar 

  100. Turatsinze JV, Thomas-Chollier M, Defrance M et al (2008) Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat Protoc 3:1578–1588

    Article  CAS  PubMed  Google Scholar 

  101. Kinsella RJ, Kahari A, Haider S et al (2011) Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database (Oxford) 2011, bar030

    Google Scholar 

  102. Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11:31–46

    Article  CAS  PubMed  Google Scholar 

  103. Niedringhaus TP, Milanova D, Kerby MB et al (2011) Landscape of next-generation sequencing technologies. Anal Chem 83:4327–4341

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Ozsolak F, Milos PM (2011) RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 12:87–98

    Article  CAS  PubMed  Google Scholar 

  105. Li R, Li Y, Kristiansen K et al (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24:713–714

    Article  CAS  PubMed  Google Scholar 

  106. Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18:1851–1858

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Langmead B, Trapnell C, Pop M et al (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  108. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Lunter G, Goodson M (2011) Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res 21:936–939

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997

    Google Scholar 

  112. Sedlazeck FJ, Rescheneder P, von Haeseler A (2013) NextGenMap: fast and accurate read mapping in highly polymorphic genomes. Bioinformatics 29:2790–2791

    Article  CAS  PubMed  Google Scholar 

  113. Santana-Quintero L, Dingerdissen H, Thierry-Mieg J et al (2014) HIVE-hexagon: high-performance, parallelized sequence alignment for next-generation sequencing data analysis. PLoS One 9:e99033

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  114. Lee WP, Stromberg MP, Ward A et al (2014) MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping. PLoS One 9:e90581

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  115. Fonseca NA, Rung J, Brazma A et al (2012) Tools for mapping high-throughput sequencing data. Bioinformatics 28:3169–3177

    Article  CAS  PubMed  Google Scholar 

  116. Lindner R, Friedel CC (2012) A comprehensive evaluation of alignment algorithms in the context of RNA-seq. PLoS One 7:e52403

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Buermans HP, den Dunnen JT (2014) Next generation sequencing technology: advances and applications. Biochim Biophys Acta 1842:1932–1941

    Article  CAS  PubMed  Google Scholar 

  118. van Dijk EL, Auger H, Jaszczyszyn Y et al (2014) Ten years of next-generation sequencing technology. Trends Genet 30:418–426

    Article  PubMed  CAS  Google Scholar 

  119. Li JW, Schmieder R, Ward RM et al (2012) SEQanswers: an open access community for collaboratively decoding genomes. Bioinformatics 28:1272–1273

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Scholtalbers J, Rossler J, Sorn P et al (2013) Galaxy LIMS for next-generation sequencing. Bioinformatics 29:1233–1234

    Article  CAS  PubMed  Google Scholar 

  121. Blankenberg D, Hillman-Jackson J (2014) Analysis of next-generation sequencing data using galaxy. Methods Mol Biol 1150:21–43

    Article  CAS  PubMed  Google Scholar 

  122. Liu B, Madduri RK, Sotomayor B et al (2014) Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses. J Biomed Inform 49:119–133

    Article  PubMed  PubMed Central  Google Scholar 

  123. Zweig AS, Karolchik D, Kuhn RM et al (2008) UCSC genome browser tutorial. Genomics 92:75–84

    Article  CAS  PubMed  Google Scholar 

  124. Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11:R86

    Article  PubMed  PubMed Central  Google Scholar 

  125. Hillman-Jackson J, Clements D, Blankenberg D et al (2012) Using Galaxy to perform large-scale interactive data analyses. Curr Protoc Bioinformatics Chapter 10, Unit 10.15

    Google Scholar 

  126. Smedley D, Haider S, Durinck S et al (2015) The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res 43:W589–W598

    Article  PubMed  PubMed Central  Google Scholar 

  127. Wolstencroft K, Haines R, Fellows D et al (2013) The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res 41:W557–W561

    Article  PubMed  PubMed Central  Google Scholar 

  128. Mangalam H (2002) The Bio* toolkits—a brief overview. Brief Bioinform 3:296–302

    Article  PubMed  Google Scholar 

  129. Stabenau A, McVicker G, Melsopp C et al (2004) The Ensembl core software libraries. Genome Res 14:929–933

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Yates A, Beal K, Keenan S et al (2014) The Ensembl REST API: Ensembl data for any language. Bioinformatics 31(1):143–145

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  131. Mishima H, Aerts J, Katayama T et al (2012) The Ruby UCSC API: accessing the UCSC genome database using Ruby. BMC Bioinformatics 13:240

    Article  PubMed  PubMed Central  Google Scholar 

  132. Sayers E (2013) Entrez programming utilities help [Internet]. National Center for Biotechnology Information (US), Bethesda, MD. http://www.ncbi.nlm.nih.gov/books/NBK25497/

  133. Kans J (2014) Entrez programming utilities help [Internet]. National Center for Biotechnology Information (US), Bethesda, MD. http://www.ncbi.nlm.nih.gov/books/NBK179288/

  134. Huber W, Carey VJ, Gentleman R et al (2015) Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 12:115–121

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  135. Parnell LD, Lindenbaum P, Shameer K et al (2011) BioStar: an online question & answer resource for the bioinformatics community. PLoS Comput Biol 7:e1002216

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

I would like to thank the numerous developers and support staff of genomic databases who provided valuable information during the researching and writing of this chapter. Grateful thanks also go to colleagues past and present who provided helpful information and advice. During the preparation of this chapter I worked in the laboratory of Dr. M. Méchali, whom I gratefully acknowledge for his guidance and support. I was supported financially by La Fondation pour la Recherche Médicale (FRM), and by the Centre National de la Recherche Scientifique (CNRS).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James R. A. Hutchins .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this protocol

Cite this protocol

Hutchins, J.R.A. (2017). Genomic Database Searching. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1525. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6622-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6622-6_10

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6620-2

  • Online ISBN: 978-1-4939-6622-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics