Abstract
Increasingly, vast amounts of genomics and genetic data are available. Although much of the data is largely accessible to relatively simple web queries, in some cases, more complex queries are required. This paper reviews the hierarchy of tools for querying genetic and genomic data. For querying multiple genes, variants or regions ENSEMBL BioMart and the UCSC Table Browser offer flexible interfaces. For more complex queries, GALAXY is a sophisticated tool for building workflows over existing internet resources. For the most challenging genome scale queries, programmatic access may be required through a defined application programming interface (API) - such as the one provided by Ensembl. All these tools allow one to rapidly ask many questions that were difficult to answer a few years ago, but choosing the appropriate tool for the job is critical.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Stein LD (2008) Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges. Nat Genet. 9(9):678–688.
Smith B, Ashburner M, Rosse C, et al. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration, 1. Nat Biotechnol. 25(11):1251–1255.
Kasprzyk A, Keefe D, Smedley D, et al. (2004) EnsMart: A generic system for fast and flexible access to biological data. Genome Res. 14:160–169.
Karolchik D, Kuhn, RM, Baertsch R, et al. (2008) The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 36:D773-D779.
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html.
Durinck S, Moreau Y, Kasprzyk A, et al. (2005) BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21(16):3439–3440.
Giardine B, Riemer C, Hardison RC, et al. (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15(10):1451–1455.
Harrow J, Denoeud F, Frankish A, et al. (2006) GENCODE: producing a reference annotation for ENCODE. Genome Biol. Suppl. 1:S4.1-S4.9.
http://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systems.
Oinn T, Addis M, Ferris J, et al. (2004) Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17):3045–3054.
Inforsense http://www.inforsense.com/.
Accelrys SciTegic Pipeline Pilot http://accelrys.com/products/scitegic/.
Birney E, Andrews TD, Bevan P, et al. (2004) An overview of Ensembl. Genome Res. 14(5):925–928.
Wheeler DL, Barrett T, Benson DA, et al. (2008) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 36(Database issue):D13-D21.
Stabenau A, McVicker G, Melsopp C, et al. (2004) The Ensembl core software libraries. Genome Res. 14(5):929–933.
Cohen KB, Hunter L (2008) Getting started in text mining. PLoS Comput Biol. 4(1):e20.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Woollard, P.M. (2010). Asking Complex Questions of the Genome Without Programming. In: Barnes, M., Breen, G. (eds) Genetic Variation. Methods in Molecular Biology, vol 628. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60327-367-1_3
Download citation
DOI: https://doi.org/10.1007/978-1-60327-367-1_3
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-60327-366-4
Online ISBN: 978-1-60327-367-1
eBook Packages: Springer Protocols