Principles and Methods of Sequence Analysis

  • Eugene V. Koonin
  • Michael Y. Galperin


This chapter is the longest in the book as it deals with both general principles and practical aspects of sequence and, to a lesser degree, structure analysis. Although these methods are not, in themselves, part of genomics, no reasonable genome analysis and annotation would be possible without understanding how these methods work and some practical experience with their use. Inappropriate use of sequence analysis procedures may result in numerous errors in genome annotation (we have already touched upon this subject in the previous chapter and further discuss it in Chapter 5). We attempted to strike a balance between generalities and specifics, aiming to give the reader a clear perspective of the computational approaches used in comparative and functional genomics, rather than discuss any one of these approaches in great detail. In particular, we refrained from any extensive discussion of the statistical basis and algorithmic aspects of sequence analysis because these can be found in several recent books on computational biology and bioinformatics (see ♦4.8) and, no less importantly, because we cannot claim advanced expertise in this area. We also tried not to duplicate the “click here”-type tutorials, which are available on many web sites. However, we deemed it important to point out some difficult and confusing issues in sequence analysis and warn the readers against the most common pitfalls.


Database Search Substitution Matrice BRCT Domain Biotin Carboxylase Amino Acid Substitution Matrice 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Further Reading

  1. 1.
    Doolittle RF. 1986. Of Urfs and Orfs: A primer on how to analyze derived amino acid sequences. University Science Books, San Diego.Google Scholar
  2. 2.
    Baxevanis AD and Ouellette BFF (eds). 2001. Bioinformatics: a practical guide to the analysis of genes and proteins. John Wiley & Sons, New York.Google Scholar
  3. 3.
    Mount DW. 2000. Bioinformatics: Sequence and genome analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Chapter 1.Google Scholar
  4. 4.
    Durbin R, Eddy SR, Krogh A and Mitchison G. 1997. Biological Sequence Analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, UK.Google Scholar
  5. 5.
    Waterman MS. 1995. Introduction to Computational BiologyMaps, Sequences and Genomes. CRC Press, Boca Raton, FL.Google Scholar
  6. 6.
    Gusfield D. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge, UK.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2003

Authors and Affiliations

  • Eugene V. Koonin
    • 1
  • Michael Y. Galperin
    • 1
  1. 1.National Center for Biotechnology Information, National Library of MedicineNational Institutes of HealthUSA

Personalised recommendations