Abstract
Database searching is perhaps the fastest, cheapest, and most powerful experiment a biologist can perform. No other 10-s test allows a biologist to reveal so much about the function, structure, location or origin of a gene, protein, organelle, or organism. A database search does not consume any reagents or require any specific wet-bench laboratory skills; just about anyone can do it, but the key is to do it correctly. The power of database searching comes from not only the size of today’s sequence databases (now containing more than 700,000 annotated gene and protein sequences), but from the ingenuity of certain key algorithms that have been developed to facilitate this very special kind of searching. Given the importance of database searching it is crucial that today’s life scientists try to become as familiar as possible with the details of the process. Indeed, the intent of this chapter to provide the reader with some insight and historical background to the methods and algorithms that form the foundation of a few of the most common database searching techniques. There are many strengths, misconceptions and weaknesses to these simple but incredibly useful computer experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Suggested Readings
DNA Versus Protein
Baxevanis, A. D. and Ouellette, B. F. F. (2001) Bioinformatics A Practical Guide to the Analysis of Genes and Proteins 2nd Edition, John Wiley & Sons, NY.
Doolittle, R. F. (1986) Of URFs and ORFs: A Primer on How to Analyze Derived Amino Acid Sequences, University Science Books, Mill Valley, CA.
Dynamic Programming and Sequence Similarity
Needleman, S. B. and Wunsch, C. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. Mol. Biol. 48, 443–453.
Dynamic Programming: The Algorithm
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool, J. Mol. Biol. 215, 403–410.
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res. 25, 3389–3402.
Lipman, D. J. and Pearson, W. R. (1985) Rapid and sensitive protein similarity searches, Science 227, 1435–1441.
Pearson, W. R. and Lipman, D. J. (1988) Improved tools for biological sequence comparison, Proc. Natl. Acad. Sci. USA 85, 2444–2448.
Pearson, W. R. (2000) Flexible sequence similarity searching with the FASTA3 program package, Methods Mol. Biol. 132, 185–219.
Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences, J. Mol. Biol. 47, 195–197.
Scoring Matrices The Dayhoff (PAM) Scoring Matrices
Dayhoff, M. O., Barker, W. C., and Hunt, L. T. (1983) Establishing homologies in protein sequences, Methods Enzymol. 91, 534–545.
Scoring Matrices The BLOSUM Scoring Matrices
Henikoff, S. and Henikoff, J.G. (1992) Amino acid substitution matrices from protein blocks, Proc. Natl. Acad. Sci. USA 89, 10,915–10,919.
Henikoff, S. and Henikoff, J.G. (1991) Automated assembly of protein blocks for database searching, Nucleic Acids Res. 19, 6565–6572.
Fast Local Alignment Methods
Wootton, J. C. and Federhas, S. (1996) Analysis of compositionally biased regions in sequence databases, Methods Enzymol. 266, 554–571.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer Science+Business Media New York
About this chapter
Cite this chapter
Wishart, D.S. (2003). Sequence Similarity and Database Searching. In: Krawetz, S.A., Womble, D.D. (eds) Introduction to Bioinformatics. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-59259-335-4_27
Download citation
DOI: https://doi.org/10.1007/978-1-59259-335-4_27
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-58829-241-4
Online ISBN: 978-1-59259-335-4
eBook Packages: Springer Book Archive