Advertisement

Bioinformatics pp 225-269 | Cite as

Genomic Database Searching

  • James R. A. HutchinsEmail author
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1525)

Abstract

The availability of reference genome sequences for virtually all species under active research has revolutionized biology. Analyses of genomic variations in many organisms have provided insights into phenotypic traits, evolution and disease, and are transforming medicine. All genomic data from publicly funded projects are freely available in Internet-based databases, for download or searching via genome browsers such as Ensembl, Vega, NCBI’s Map Viewer, and the UCSC Genome Browser. These online tools generate interactive graphical outputs of relevant chromosomal regions, showing genes, transcripts, and other genomic landmarks, and epigenetic features mapped by projects such as ENCODE.

This chapter provides a broad overview of the major genomic databases and browsers, and describes various approaches and the latest resources for searching them. Methods are provided for identifying genomic locus and sequence information using gene names or codes, identifiers for DNA and RNA molecules and proteins; also from karyotype bands, chromosomal coordinates, sequences, motifs, and matrix-based patterns. Approaches are also described for batch retrieval of genomic information, performing more complex queries, and analyzing larger sets of experimental data, for example from next-generation sequencing projects.

Key words

Bioinformatics Epigenetics Genome browsers Identifiers Internet-based software Next-generation sequencing Motifs Matrices Sequences 

Abbreviations

API

Application Programming Interface

BED

Browser Extensible Data

BLAST

Basic Local Alignment Search Tool

BLAT

BLAST-Like Alignment Tool

DDBJ

DNA Databank of Japan

EBI

European Bioinformatics Institute

EMBOSS

European Molecular Biology Open Software Suite

ENA

European Nucleotide Archive

ENCODE

Encyclopedia Of DNA Elements

FTP

File Transfer Protocol

GI

GenInfo Identifier

GOLD

Genomes Online Database

GRC

Genome Reference Consortium

GUI

Graphical User Interface

HAVANA

Human and Vertebrate Analysis and Annotation

ID

Identifier Code

INSDC

International Nucleotide Sequence Database Collaboration

NCBI

National Center for Biotechnology Information

NGS

Next-Generation Sequencing

PWM

Position Weight Matrix

RegEx

Regular Expression

REST

Representational State Transfer

ROI

Region of Interest

RSAT

Regulatory Sequence Analysis Tools

UCSC

University of California Santa Cruz

URL

Uniform Resource Locator

Vega

Vertebrate Genome Annotation

Notes

Acknowledgements

I would like to thank the numerous developers and support staff of genomic databases who provided valuable information during the researching and writing of this chapter. Grateful thanks also go to colleagues past and present who provided helpful information and advice. During the preparation of this chapter I worked in the laboratory of Dr. M. Méchali, whom I gratefully acknowledge for his guidance and support. I was supported financially by La Fondation pour la Recherche Médicale (FRM), and by the Centre National de la Recherche Scientifique (CNRS).

References

  1. 1.
    Sanger F, Air GM, Barrell BG et al (1977) Nucleotide sequence of bacteriophage phi X174 DNA. Nature 265:687–695PubMedCrossRefGoogle Scholar
  2. 2.
    Fleischmann RD, Adams MD, White O et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512PubMedCrossRefGoogle Scholar
  3. 3.
    Johnston M (1996) The complete code for a eukaryotic cell. Genome sequencing. Curr Biol 6:500–503PubMedCrossRefGoogle Scholar
  4. 4.
    C. elegans Sequencing Consortium (1998) Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282:2012–2018CrossRefGoogle Scholar
  5. 5.
    Lander ES, Linton LM, Birren B et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921PubMedCrossRefGoogle Scholar
  6. 6.
    Venter JC, Adams MD, Myers EW et al (2001) The sequence of the human genome. Science 291:1304–1351PubMedCrossRefGoogle Scholar
  7. 7.
    IHGSC (2004) Finishing the euchromatic sequence of the human genome. Nature 431:931–945CrossRefGoogle Scholar
  8. 8.
    Reddy TB, Thomas AD, Stamatis D et al (2015) The Genomes OnLine Database (GOLD) v. 5: a metadata management system based on a four level (meta)genome project classification. Nucleic Acids Res 43:D1099–D1106PubMedCrossRefGoogle Scholar
  9. 9.
    Warren WC, Hillier LW, Marshall Graves JA et al (2008) Genome analysis of the platypus reveals unique signatures of evolution. Nature 453:175–183PubMedPubMedCentralCrossRefGoogle Scholar
  10. 10.
    Amemiya CT, Alfoldi J, Lee AP et al (2013) The African coelacanth genome provides insights into tetrapod evolution. Nature 496:311–316PubMedPubMedCentralCrossRefGoogle Scholar
  11. 11.
    Prüfer K, Racimo F, Patterson N et al (2014) The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505:43–49PubMedCrossRefGoogle Scholar
  12. 12.
    King TE, Fortes GG, Balaresque P et al (2014) Identification of the remains of King Richard III. Nat Commun 5:5631PubMedPubMedCentralCrossRefGoogle Scholar
  13. 13.
    Abecasis GR, Altshuler D, Auton A et al (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073PubMedCrossRefGoogle Scholar
  14. 14.
    Abecasis GR, Auton A, Brooks LD et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65PubMedCrossRefGoogle Scholar
  15. 15.
    Torjesen I (2013) Genomes of 100,000 people will be sequenced to create an open access research resource. BMJ 347:f6690PubMedCrossRefGoogle Scholar
  16. 16.
    Baslan T, Hicks J (2014) Single cell sequencing approaches for complex biological systems. Curr Opin Genet Dev 26C:59–65CrossRefGoogle Scholar
  17. 17.
    Liang J, Cai W, Sun Z (2014) Single-cell sequencing technologies: current and future. J Genet Genomics = Yi Chuan Xue Bao 41:513–528PubMedCrossRefGoogle Scholar
  18. 18.
    Dykes CW (1996) Genes, disease and medicine. Br J Clin Pharmacol 42:683–695PubMedPubMedCentralCrossRefGoogle Scholar
  19. 19.
    Chan IS, Ginsburg GS (2011) Personalized medicine: progress and promise. Annu Rev Genomics Hum Genet 12:217–244PubMedCrossRefGoogle Scholar
  20. 20.
    Bauer DC, Gaff C, Dinger ME et al (2014) Genomics and personalised whole-of-life healthcare. Trends Mol Med 20(9):479–486PubMedCrossRefGoogle Scholar
  21. 21.
    Check Hayden E (2010) Human genome at ten: life is complicated. Nature 464:664–667PubMedCrossRefGoogle Scholar
  22. 22.
    Dulbecco R (1986) A turning point in cancer research: sequencing the human genome. Science 231:1055–1056PubMedCrossRefGoogle Scholar
  23. 23.
    International Cancer Genome Consortium, Hudson TJ, Anderson W et al (2010) International network of cancer genome projects. Nature 464, 993–998Google Scholar
  24. 24.
    Alexandrov LB, Stratton MR (2014) Mutational signatures: the patterns of somatic mutations hidden in cancer genomes. Curr Opin Genet Dev 24C:52–60CrossRefGoogle Scholar
  25. 25.
    Hoffman MM, Ernst J, Wilder SP et al (2013) Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res 41:827–841PubMedCrossRefGoogle Scholar
  26. 26.
    modEncode Consortium, Roy S, Ernst J et al (2010) Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330:1787–1797CrossRefGoogle Scholar
  27. 27.
    Gerstein MB, Lu ZJ, Van Nostrand EL et al (2010) Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science 330:1775–1787PubMedPubMedCentralCrossRefGoogle Scholar
  28. 28.
    Harrow J, Frankish A, Gonzalez JM et al (2012) GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res 22:1760–1774PubMedPubMedCentralCrossRefGoogle Scholar
  29. 29.
    Almouzni G, Altucci L, Amati B et al (2014) Relationship between genome and epigenome—challenges and requirements for future research. BMC Genomics 15:487PubMedPubMedCentralCrossRefGoogle Scholar
  30. 30.
    Hériché JK (2014) Systematic cell phenotyping. In: Hancock JM (ed) Phenomics. CRC Press, Boca Raton, FL, pp 86–110CrossRefGoogle Scholar
  31. 31.
    Hutchins JRA (2014) What's that gene (or protein)? Online resources for exploring functions of genes, transcripts, and proteins. Mol Biol Cell 25:1187–1201PubMedPubMedCentralCrossRefGoogle Scholar
  32. 32.
    Schmidt A, Forne I, Imhof A (2014) Bioinformatic analysis of proteomics data. BMC Syst Biol 8(Suppl 2):S3PubMedPubMedCentralCrossRefGoogle Scholar
  33. 33.
    Kaiser J (2005) Genomics. Celera to end subscriptions and give data to public GenBank. Science 308:775PubMedCrossRefGoogle Scholar
  34. 34.
    Church DM, Schneider VA, Graves T et al (2011) Modernizing reference genome assemblies. PLoS Biol 9:e1001091PubMedPubMedCentralCrossRefGoogle Scholar
  35. 35.
    Gerstein MB, Bruce C, Rozowsky JS et al (2007) What is a gene, post-ENCODE? History and updated definition. Genome Res 17:669–681PubMedCrossRefGoogle Scholar
  36. 36.
    Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94PubMedCrossRefGoogle Scholar
  37. 37.
    Thierry-Mieg D, Thierry-Mieg J (2006) AceView: a comprehensive cDNA-supported gene and transcripts annotation. Genome Biol 7(Suppl 1):S12.1–S12.14CrossRefGoogle Scholar
  38. 38.
    MGC Project Team, Temple G, Gerhard DS et al (2009) The completion of the Mammalian Gene Collection (MGC). Genome Res 19:2324–2333CrossRefGoogle Scholar
  39. 39.
    Farrell CM, O'Leary NA, Harte RA et al (2014) Current status and new features of the Consensus Coding Sequence database. Nucleic Acids Res 42:D865–D872PubMedCrossRefGoogle Scholar
  40. 40.
    Cunningham F, Amode MR, Barrell D et al (2015) Ensembl 2015. Nucleic Acids Res 43:D662–D669PubMedCrossRefGoogle Scholar
  41. 41.
    Pruitt KD, Brown GR, Hiatt SM et al (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids Res 42:D756–D763PubMedCrossRefGoogle Scholar
  42. 42.
    Harrow JL, Steward CA, Frankish A et al (2014) The Vertebrate Genome Annotation browser 10 years on. Nucleic Acids Res 42:D771–D779PubMedCrossRefGoogle Scholar
  43. 43.
    Frankish A, Uszczynska B, Ritchie GR et al (2015) Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction. BMC Genomics 16(Suppl 8):S2PubMedPubMedCentralCrossRefGoogle Scholar
  44. 44.
    Kersey PJ, Allen JE, Christensen M et al (2014) Ensembl Genomes 2013: scaling up access to genome-wide data. Nucleic Acids Res 42:D546–D552PubMedCrossRefGoogle Scholar
  45. 45.
    NCBI Resource Coordinators (2015) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 43:D6–D17CrossRefGoogle Scholar
  46. 46.
    Gray KA, Yates B, Seal RL et al (2015) Genenames.org: the HGNC resources in 2015. Nucleic Acids Res 43:D1079–D1085PubMedCrossRefGoogle Scholar
  47. 47.
    dos Santos G, Schroeder AJ, Goodman JL et al (2015) FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations. Nucleic Acids Res 43:D690–D697PubMedCrossRefGoogle Scholar
  48. 48.
    Silvester N, Alako B, Amid C et al (2015) Content discovery and retrieval services at the European Nucleotide Archive. Nucleic Acids Res 43:D23–D29PubMedCrossRefGoogle Scholar
  49. 49.
    Kodama Y, Mashima J, Kosuge T et al (2015) The DDBJ Japanese Genotype-phenotype Archive for genetic and phenotypic human data. Nucleic Acids Res 43:D18–D22PubMedCrossRefGoogle Scholar
  50. 50.
    UniProt Consortium (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:D204–D212CrossRefGoogle Scholar
  51. 51.
    Rosenbloom KR, Armstrong J, Barber GP et al (2015) The UCSC Genome Browser database: 2015 update. Nucleic Acids Res 43:D670–D681PubMedCrossRefGoogle Scholar
  52. 52.
    Hsu F, Kent WJ, Clawson H et al (2006) The UCSC known genes. Bioinformatics 22:1036–1046PubMedCrossRefGoogle Scholar
  53. 53.
    Nawrocki EP, Burge SW, Bateman A et al (2015) Rfam 12.0: updates to the RNA families database. Nucleic Acids Res 43:D130–D137PubMedCrossRefGoogle Scholar
  54. 54.
    Chan PP, Lowe TM (2009) GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res 37:D93–D97PubMedCrossRefGoogle Scholar
  55. 55.
    Punta M, Coggill PC, Eberhardt RY et al (2012) The Pfam protein families database. Nucleic Acids Res 40:D290–D301PubMedCrossRefGoogle Scholar
  56. 56.
    Tatusova T (2010) Genomic databases and resources at the National Center for Biotechnology Information. Methods Mol Biol 609:17–44PubMedCrossRefGoogle Scholar
  57. 57.
    Wolfsberg TG (2011) Using the NCBI Map Viewer to browse genomic sequence data. Curr Protoc Hum Genet. Chapter 18. Unit 18.15Google Scholar
  58. 58.
    Brown GR, Hem V, Katz KS et al (2015) Gene: a gene-centered information resource at NCBI. Nucleic Acids Res 43:D36–D42PubMedCrossRefGoogle Scholar
  59. 59.
    Brister JR, Ako-Adjei D, Bao Y et al (2015) NCBI viral genomes resource. Nucleic Acids Res 43:D571–D577PubMedCrossRefGoogle Scholar
  60. 60.
    Nicol JW, Helt GA, Blanchard SG Jr et al (2009) The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets. Bioinformatics 25:2730–2731PubMedPubMedCentralCrossRefGoogle Scholar
  61. 61.
    Thorvaldsdottir H, Robinson JT, Mesirov JP (2013) Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 14:178–192PubMedCrossRefGoogle Scholar
  62. 62.
    Fiume M, Smith EJ, Brook A et al (2012) Savant Genome Browser 2: visualization and analysis for population-scale genomics. Nucleic Acids Res 40:W615–W621PubMedPubMedCentralCrossRefGoogle Scholar
  63. 63.
    Wright MW, Bruford EA (2011) Naming ‘junk’: human non-protein coding RNA (ncRNA) gene nomenclature. Hum Genomics 5:90–98PubMedPubMedCentralCrossRefGoogle Scholar
  64. 64.
    Agirre E, Eyras E (2011) Databases and resources for human small non-coding RNAs. Hum Genomics 5:192–199PubMedPubMedCentralCrossRefGoogle Scholar
  65. 65.
    The RNAcentral Consortium (2015) RNAcentral: an international database of ncRNA sequences. Nucleic Acids Res 43:D123–D129CrossRefGoogle Scholar
  66. 66.
    Nakamura Y, Cochrane G, Karsch-Mizrachi I (2013) The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res 41:D21–D24PubMedCrossRefGoogle Scholar
  67. 67.
    Ameres SL, Zamore PD (2013) Diversifying microRNA sequence and function. Nat Rev Mol Cell Biol 14:475–488PubMedCrossRefGoogle Scholar
  68. 68.
    Kozomara A, Griffiths-Jones S (2014) miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res 42:D68–D73PubMedCrossRefGoogle Scholar
  69. 69.
    Mani SR, Juliano CE (2013) Untangling the web: the diverse functions of the PIWI/piRNA pathway. Mol Reprod Dev 80:632–664PubMedPubMedCentralCrossRefGoogle Scholar
  70. 70.
    Peng JC, Lin H (2013) Beyond transposons: the epigenetic and somatic functions of the Piwi-piRNA mechanism. Curr Opin Cell Biol 25:190–194PubMedPubMedCentralCrossRefGoogle Scholar
  71. 71.
    Sai Lakshmi S, Agrawal S (2008) piRNABank: a web resource on classified and clustered Piwi-interacting RNAs. Nucleic Acids Res 36:D173–D177PubMedCrossRefGoogle Scholar
  72. 72.
    Zhang P, Si X, Skogerbo G et al (2014) piRBase: a web resource assisting piRNA functional study. Database (Oxford) 2014, bau110Google Scholar
  73. 73.
    Sarkar A, Maji RK, Saha S et al (2014) piRNAQuest: searching the piRNAome for silencers. BMC Genomics 15:555PubMedPubMedCentralCrossRefGoogle Scholar
  74. 74.
    Skinner ME, Uzilov AV, Stein LD et al (2009) JBrowse: a next-generation genome browser. Genome Res 19:1630–1638PubMedPubMedCentralCrossRefGoogle Scholar
  75. 75.
    Kung JT, Colognori D, Lee JT (2013) Long noncoding RNAs: past, present, and future. Genetics 193:651–669PubMedPubMedCentralCrossRefGoogle Scholar
  76. 76.
    Bonasio R, Shiekhattar R (2014) Regulation of transcription by long noncoding RNAs. Annu Rev Genet 48:433–455PubMedPubMedCentralCrossRefGoogle Scholar
  77. 77.
    Wright MW (2014) A short guide to long non-coding RNA gene nomenclature. Hum Genomics 8:7PubMedPubMedCentralCrossRefGoogle Scholar
  78. 78.
    Fritah S, Niclou SP, Azuaje F (2014) Databases for lncRNAs: a comparative evaluation of emerging tools. RNA 20:1655–1665PubMedPubMedCentralCrossRefGoogle Scholar
  79. 79.
    Quek XC, Thomson DW, Maag JL et al (2015) lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res 43:D168–D173PubMedCrossRefGoogle Scholar
  80. 80.
    Craig JM, Bickmore WA (1993) Chromosome bands—flavours to savour. Bioessays 15:349–354PubMedCrossRefGoogle Scholar
  81. 81.
    Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410PubMedCrossRefGoogle Scholar
  82. 82.
    Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12:656–664PubMedPubMedCentralCrossRefGoogle Scholar
  83. 83.
    Jacox E, Elnitski L (2008) Finding occurrences of relevant functional elements in genomic signatures. Int J Comput Sci 2:599–606PubMedPubMedCentralGoogle Scholar
  84. 84.
    Brennan RG, Matthews BW (1989) Structural basis of DNA-protein recognition. Trends Biochem Sci 14:286–290PubMedCrossRefGoogle Scholar
  85. 85.
    Hudson WH, Ortlund EA (2014) The structure, function and evolution of proteins that bind DNA and RNA. Nat Rev Mol Cell Biol 15:749–760PubMedPubMedCentralCrossRefGoogle Scholar
  86. 86.
    Wells RD (1988) Unusual DNA structures. J Biol Chem 263:1095–1098PubMedGoogle Scholar
  87. 87.
    Hedgpeth J, Goodman HM, Boyer HW (1972) DNA nucleotide sequence restricted by the RI endonuclease. Proc Natl Acad Sci U S A 69:3448–3452PubMedPubMedCentralCrossRefGoogle Scholar
  88. 88.
    Wei CL, Wu Q, Vega VB et al (2006) A global map of p53 transcription-factor binding sites in the human genome. Cell 124:207–219PubMedCrossRefGoogle Scholar
  89. 89.
    Mergny JL (2012) Alternative DNA structures: G4 DNA in cells: itae missa est? Nat Chem Biol 8:225–226PubMedCrossRefGoogle Scholar
  90. 90.
    Giraldo R, Suzuki M, Chapman L et al (1994) Promotion of parallel DNA quadruplexes by a yeast telomere binding protein: a circular dichroism study. Proc Natl Acad Sci U S A 91:7658–7662PubMedPubMedCentralCrossRefGoogle Scholar
  91. 91.
    Cayrou C, Coulombe P, Puy A et al (2012) New insights into replication origin characteristics in metazoans. Cell Cycle 11:658–667PubMedPubMedCentralCrossRefGoogle Scholar
  92. 92.
    Brown P, Baxter L, Hickman R et al (2013) MEME-LaB: motif analysis in clusters. Bioinformatics 29:1696–1697PubMedPubMedCentralCrossRefGoogle Scholar
  93. 93.
    Grant CE, Bailey TL, Noble WS (2011) FIMO: scanning for occurrences of a given motif. Bioinformatics 27:1017–1018PubMedPubMedCentralCrossRefGoogle Scholar
  94. 94.
    Medina-Rivera A, Defrance M, Sand O et al (2015) RSAT 2015: regulatory sequence analysis tools. Nucleic Acids Res 43:W50–W56PubMedPubMedCentralCrossRefGoogle Scholar
  95. 95.
    Rice P, Longden I, Bleasby A (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16:276–277PubMedCrossRefGoogle Scholar
  96. 96.
    Stormo GD, Zhao Y (2010) Determining the specificity of protein-DNA interactions. Nat Rev Genet 11:751–760PubMedGoogle Scholar
  97. 97.
    Kel AE, Gossling E, Reuter I et al (2003) MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res 31:3576–3579PubMedPubMedCentralCrossRefGoogle Scholar
  98. 98.
    Wingender E (2008) The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Brief Bioinform 9:326–332PubMedCrossRefGoogle Scholar
  99. 99.
    Wrzodek C, Schroder A, Drager A et al (2010) ModuleMaster: a new tool to decipher transcriptional regulatory networks. Biosystems 99:79–81PubMedCrossRefGoogle Scholar
  100. 100.
    Turatsinze JV, Thomas-Chollier M, Defrance M et al (2008) Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat Protoc 3:1578–1588PubMedCrossRefGoogle Scholar
  101. 101.
    Kinsella RJ, Kahari A, Haider S et al (2011) Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database (Oxford) 2011, bar030Google Scholar
  102. 102.
    Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11:31–46PubMedCrossRefGoogle Scholar
  103. 103.
    Niedringhaus TP, Milanova D, Kerby MB et al (2011) Landscape of next-generation sequencing technologies. Anal Chem 83:4327–4341PubMedPubMedCentralCrossRefGoogle Scholar
  104. 104.
    Ozsolak F, Milos PM (2011) RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 12:87–98PubMedCrossRefGoogle Scholar
  105. 105.
    Li R, Li Y, Kristiansen K et al (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24:713–714PubMedCrossRefGoogle Scholar
  106. 106.
    Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18:1851–1858PubMedPubMedCentralCrossRefGoogle Scholar
  107. 107.
    Langmead B, Trapnell C, Pop M et al (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25PubMedPubMedCentralCrossRefGoogle Scholar
  108. 108.
    Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760PubMedPubMedCentralCrossRefGoogle Scholar
  109. 109.
    Lunter G, Goodson M (2011) Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res 21:936–939PubMedPubMedCentralCrossRefGoogle Scholar
  110. 110.
    Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359PubMedPubMedCentralCrossRefGoogle Scholar
  111. 111.
    Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997Google Scholar
  112. 112.
    Sedlazeck FJ, Rescheneder P, von Haeseler A (2013) NextGenMap: fast and accurate read mapping in highly polymorphic genomes. Bioinformatics 29:2790–2791PubMedCrossRefGoogle Scholar
  113. 113.
    Santana-Quintero L, Dingerdissen H, Thierry-Mieg J et al (2014) HIVE-hexagon: high-performance, parallelized sequence alignment for next-generation sequencing data analysis. PLoS One 9:e99033PubMedPubMedCentralCrossRefGoogle Scholar
  114. 114.
    Lee WP, Stromberg MP, Ward A et al (2014) MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping. PLoS One 9:e90581PubMedPubMedCentralCrossRefGoogle Scholar
  115. 115.
    Fonseca NA, Rung J, Brazma A et al (2012) Tools for mapping high-throughput sequencing data. Bioinformatics 28:3169–3177PubMedCrossRefGoogle Scholar
  116. 116.
    Lindner R, Friedel CC (2012) A comprehensive evaluation of alignment algorithms in the context of RNA-seq. PLoS One 7:e52403PubMedPubMedCentralCrossRefGoogle Scholar
  117. 117.
    Buermans HP, den Dunnen JT (2014) Next generation sequencing technology: advances and applications. Biochim Biophys Acta 1842:1932–1941PubMedCrossRefGoogle Scholar
  118. 118.
    van Dijk EL, Auger H, Jaszczyszyn Y et al (2014) Ten years of next-generation sequencing technology. Trends Genet 30:418–426PubMedCrossRefGoogle Scholar
  119. 119.
    Li JW, Schmieder R, Ward RM et al (2012) SEQanswers: an open access community for collaboratively decoding genomes. Bioinformatics 28:1272–1273PubMedPubMedCentralCrossRefGoogle Scholar
  120. 120.
    Scholtalbers J, Rossler J, Sorn P et al (2013) Galaxy LIMS for next-generation sequencing. Bioinformatics 29:1233–1234PubMedCrossRefGoogle Scholar
  121. 121.
    Blankenberg D, Hillman-Jackson J (2014) Analysis of next-generation sequencing data using galaxy. Methods Mol Biol 1150:21–43PubMedCrossRefGoogle Scholar
  122. 122.
    Liu B, Madduri RK, Sotomayor B et al (2014) Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses. J Biomed Inform 49:119–133PubMedPubMedCentralCrossRefGoogle Scholar
  123. 123.
    Zweig AS, Karolchik D, Kuhn RM et al (2008) UCSC genome browser tutorial. Genomics 92:75–84PubMedCrossRefGoogle Scholar
  124. 124.
    Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11:R86PubMedPubMedCentralCrossRefGoogle Scholar
  125. 125.
    Hillman-Jackson J, Clements D, Blankenberg D et al (2012) Using Galaxy to perform large-scale interactive data analyses. Curr Protoc Bioinformatics Chapter 10, Unit 10.15Google Scholar
  126. 126.
    Smedley D, Haider S, Durinck S et al (2015) The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res 43:W589–W598PubMedPubMedCentralCrossRefGoogle Scholar
  127. 127.
    Wolstencroft K, Haines R, Fellows D et al (2013) The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res 41:W557–W561PubMedPubMedCentralCrossRefGoogle Scholar
  128. 128.
    Mangalam H (2002) The Bio* toolkits—a brief overview. Brief Bioinform 3:296–302PubMedCrossRefGoogle Scholar
  129. 129.
    Stabenau A, McVicker G, Melsopp C et al (2004) The Ensembl core software libraries. Genome Res 14:929–933PubMedPubMedCentralCrossRefGoogle Scholar
  130. 130.
    Yates A, Beal K, Keenan S et al (2014) The Ensembl REST API: Ensembl data for any language. Bioinformatics 31(1):143–145PubMedPubMedCentralCrossRefGoogle Scholar
  131. 131.
    Mishima H, Aerts J, Katayama T et al (2012) The Ruby UCSC API: accessing the UCSC genome database using Ruby. BMC Bioinformatics 13:240PubMedPubMedCentralCrossRefGoogle Scholar
  132. 132.
    Sayers E (2013) Entrez programming utilities help [Internet]. National Center for Biotechnology Information (US), Bethesda, MD. http://www.ncbi.nlm.nih.gov/books/NBK25497/
  133. 133.
    Kans J (2014) Entrez programming utilities help [Internet]. National Center for Biotechnology Information (US), Bethesda, MD. http://www.ncbi.nlm.nih.gov/books/NBK179288/
  134. 134.
    Huber W, Carey VJ, Gentleman R et al (2015) Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 12:115–121PubMedPubMedCentralCrossRefGoogle Scholar
  135. 135.
    Parnell LD, Lindenbaum P, Shameer K et al (2011) BioStar: an online question & answer resource for the bioinformatics community. PLoS Comput Biol 7:e1002216PubMedPubMedCentralCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Institute of Human Genetics (IGH), CNRSMontpellierFrance

Personalised recommendations