Skip to main content

Sequence Similarity and Database Searching

  • Chapter
Introduction to Bioinformatics
  • 1004 Accesses

Abstract

Database searching is perhaps the fastest, cheapest, and most powerful experiment a biologist can perform. No other 10-s test allows a biologist to reveal so much about the function, structure, location or origin of a gene, protein, organelle, or organism. A database search does not consume any reagents or require any specific wet-bench laboratory skills; just about anyone can do it, but the key is to do it correctly. The power of database searching comes from not only the size of today’s sequence databases (now containing more than 700,000 annotated gene and protein sequences), but from the ingenuity of certain key algorithms that have been developed to facilitate this very special kind of searching. Given the importance of database searching it is crucial that today’s life scientists try to become as familiar as possible with the details of the process. Indeed, the intent of this chapter to provide the reader with some insight and historical background to the methods and algorithms that form the foundation of a few of the most common database searching techniques. There are many strengths, misconceptions and weaknesses to these simple but incredibly useful computer experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Suggested Readings

DNA Versus Protein

  • Baxevanis, A. D. and Ouellette, B. F. F. (2001) Bioinformatics A Practical Guide to the Analysis of Genes and Proteins 2nd Edition, John Wiley & Sons, NY.

    Book  Google Scholar 

  • Doolittle, R. F. (1986) Of URFs and ORFs: A Primer on How to Analyze Derived Amino Acid Sequences, University Science Books, Mill Valley, CA.

    Google Scholar 

Dynamic Programming and Sequence Similarity

  • Needleman, S. B. and Wunsch, C. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. Mol. Biol. 48, 443–453.

    Article  PubMed  CAS  Google Scholar 

Dynamic Programming: The Algorithm

  • Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool, J. Mol. Biol. 215, 403–410.

    PubMed  CAS  Google Scholar 

  • Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res. 25, 3389–3402.

    Article  PubMed  CAS  Google Scholar 

  • Lipman, D. J. and Pearson, W. R. (1985) Rapid and sensitive protein similarity searches, Science 227, 1435–1441.

    Article  PubMed  CAS  Google Scholar 

  • Pearson, W. R. and Lipman, D. J. (1988) Improved tools for biological sequence comparison, Proc. Natl. Acad. Sci. USA 85, 2444–2448.

    Article  PubMed  CAS  Google Scholar 

  • Pearson, W. R. (2000) Flexible sequence similarity searching with the FASTA3 program package, Methods Mol. Biol. 132, 185–219.

    PubMed  CAS  Google Scholar 

  • Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences, J. Mol. Biol. 47, 195–197.

    Article  Google Scholar 

Scoring Matrices The Dayhoff (PAM) Scoring Matrices

  • Dayhoff, M. O., Barker, W. C., and Hunt, L. T. (1983) Establishing homologies in protein sequences, Methods Enzymol. 91, 534–545.

    Google Scholar 

Scoring Matrices The BLOSUM Scoring Matrices

  • Henikoff, S. and Henikoff, J.G. (1992) Amino acid substitution matrices from protein blocks, Proc. Natl. Acad. Sci. USA 89, 10,915–10,919.

    Article  Google Scholar 

  • Henikoff, S. and Henikoff, J.G. (1991) Automated assembly of protein blocks for database searching, Nucleic Acids Res. 19, 6565–6572.

    Article  PubMed  CAS  Google Scholar 

Fast Local Alignment Methods

  • Wootton, J. C. and Federhas, S. (1996) Analysis of compositionally biased regions in sequence databases, Methods Enzymol. 266, 554–571.

    Article  PubMed  CAS  Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media New York

About this chapter

Cite this chapter

Wishart, D.S. (2003). Sequence Similarity and Database Searching. In: Krawetz, S.A., Womble, D.D. (eds) Introduction to Bioinformatics. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-59259-335-4_27

Download citation

  • DOI: https://doi.org/10.1007/978-1-59259-335-4_27

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-58829-241-4

  • Online ISBN: 978-1-59259-335-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics