Encyclopedia of Database Systems

Living Edition
| Editors: Ling Liu, M. Tamer Özsu

Query Languages and Evaluation Techniques for Biological Sequence Data

Living reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7993-3_630-2

Synonyms

Definition

A common type of data that is used in life science applications is biological sequence data. Data such as DNA sequence and protein sequence data are growing at a very fast rate. For example, the data at GenBank[GB07] has been growing exponentially, doubling roughly every 18 months. These sequence datasets are often queried in complex ways and the methods required to query these sequences go far beyond the simple string matching methods that have been used in more traditional string applications. In order to enable users to easily pose sophisticated queries on these biological sequences, different languages have been designed to support a rich library of functions. In addition, some database systems have been extended to support a rich set of operators on the sequence data type. Compared to the stand-alone approach, the database method brings the power of algebraic query optimization and the use of indexes making it...

Keywords

Suffix 
This is a preview of subscription content, log in to check access

Recommended Reading

  1. 1.
    Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10.CrossRefGoogle Scholar
  2. 2.
    Barbara A, Eckman AK. Querying BLAST within a data federation. Q Bull IEEE TC Data Eng. 2004;27(3):12–9.Google Scholar
  3. 3.
    Dayhoff MO, Schwartz RM, Orcutt BC. A model of evolutionary change in proteins. Atlas Protein Seq Struct. 1978;5:345–52.Google Scholar
  4. 4.
    Hammer J, Schneider M. Genomics algebra: a new, integrating data model, language, and tool for processing and querying genomic information. In: Proceedings 1st biennial conference on innovative data systems research. 2003. p. 176–87.Google Scholar
  5. 5.
    Henikoff S, Henikoff J. Amino acid substitution matrices from protein blocks. In Proc Natl Acad Sci. 1992;89(22):10915–9.CrossRefGoogle Scholar
  6. 6.
    Hsiao R-L, Stott Parker Jr D, Yang H-C. Support for BioIndexing in BLASTgres. In: In Data Integration in the Life Sciences (DILS), LNCS, vol. 3615. Berlin: Springer; 2005. p. 284–7.CrossRefGoogle Scholar
  7. 7.
    Mao R, Weijia X, Neha S, Miranker DP. An assessment of a metric space database index to support sequence homology. In: Proceedings IEEE 3rd international symposium on bioinformatics and bioengineering. 2003. p. 375–82.Google Scholar
  8. 8.
    Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. In Proc Natl Acad Sci. 1988;85(8):2444–8.CrossRefGoogle Scholar
  9. 9.
    Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981;147:195–7.CrossRefGoogle Scholar
  10. 10.
    Stephens S, Chen JY, Davidson MG, Thomas S, Trute BM. Oracle database 10 g: a platform for BLAST search and regular expression pattern matching in life sciences. Nucleic Acids Res. 2005;33(Database-Issue):675–9.Google Scholar
  11. 11.
    Stephens S, Chen JY, Thomas S. ODM BLAST: sequence homology search in the RDBMS. Q Bull IEEE TC Data Eng. 2004;27(3):20–3.Google Scholar
  12. 12.
    Tata S, Lang W, Patel JM. Periscope/SQ: interactive exploration of biological sequence databases. In: Proceedings 33rd international conference on very large data bases. 2007. p. 1406–9.Google Scholar
  13. 13.
    Tata S, Patel JM. PiQA: an algebra for querying protein data sets. In: Proceedings 15th international conference on scientific and statistical database management. 2003. p. 141–50.Google Scholar
  14. 14.
    Tata S, Patel JM, Friedman JS, Swaroop A. Declarative querying for biological sequences. In: Proceedings 22nd international conference on data engineering. 2006. p. 87.Google Scholar
  15. 15.
    Weiner P. Linear pattern matching algorithm. In: Proceedings of the 14th annual IEEE symposium on switching and automata theory. 1973. p. 1–11.Google Scholar

Copyright information

© Springer Science+Business Media LLC 2016

Authors and Affiliations

  1. 1.IBM Almaden Research CenterSan JoseUSA
  2. 2.University of Wisconsin-MadisonMadisonUSA