Computational Identification of Related Proteins

BLAST, PSI- BLAST, and Other Tools
  • Qunfeng Dong
  • Volker Brendel
Part of the Springer Protocols Handbooks book series (SPH)


Molecular sequences that share a high degree of similarity often are thought to have evolved from common ancestral genes. Closely related protein sequences will presumably correspond to similar three-dimensional structures and conserved biological functions (although the reverse is not necessarily true: similar structures and conserved functions do not imply that the corresponding protein sequences will be similar; reviewed in ref. 1). These assumptions provide the basis for computational gene annotation. Typically, the first step in characterizing a novel gene is to compare its sequence against known sequences in available databases and to predict its origin and function by copying the annotation of those previously characterized sequences. This approach has been highly successful and is probably the only practical method applicable to large-scale annotation efforts at present. It should be pointed out, however, that this practice is not without its limitations (and is also unsatisfactory from the more theoretical perspective of those who wish to determine structure and function from primary sequence; for a provocative editorial on this subject, see ref. 2). The intrinsic problems of transitive propagation of historical annotation errors have been discussed elsewhere (bi3) and are all too familiar to any biologist who has looked into the databases only to find puzzling annotations that make no sense with current knowledge.


Query Sequence Blast Output Protein Query Sequence Protein Data Bank Database Documentation File 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Weir, M., Swindells, M., and Overington, J. (2001) Insights into protein function through large-scale computational analysis of sequence and structure. Trends Biotechnol. 19, S61–S6.PubMedCrossRefGoogle Scholar
  2. 2.
    Konopka, A. K. (2003) Selected dreams and nightmares about computational biology. Comp. Biol. & Chem. 27, 91–92.CrossRefGoogle Scholar
  3. 3.
    Brendel, V. (2002) Integration of data management and analysis for genome research. In Schubert, S., Reusch, B., and Jesse, N. (eds.), “Informatik bewegt”. Lecture Notes in Informatics (LNI)—Proceedings P-20, 10–21.Google Scholar
  4. 4.
    Altschul S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.PubMedGoogle Scholar
  5. 5.
    Altschul S. F., Madden, T. L., Schäffer, A. A., et al. (1997) Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.PubMedCrossRefGoogle Scholar
  6. 6.
    Benson D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., and Wheeler, D. L. (2003) GenBank. Nucleic Acids Res. 31, 23–27.PubMedCrossRefGoogle Scholar
  7. 7.
    Westbrook, J., Feng Z., Chen L., Yang H., and Berman, H. M. (2003) The Protein Data Bank and structural genomics. Nucleic Acids Res. 31, 489–491.PubMedCrossRefGoogle Scholar
  8. 8.
    Higgins D. G., Thompson, J. D., and Gibson, T. J. (1996) Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 266, 383–402.PubMedCrossRefGoogle Scholar
  9. 9.
    Kumar S., Tamura K., and Nei M. (1994) MEGA: Molecular Evolutionary Genetics Analysis software for microcomputers. Comput. Appl. Biosci. 10, 189–191.PubMedGoogle Scholar
  10. 10.
    Felsenstein J. (1989) PHYLIP-Phylogeny Inference Package (Version 3.2). Cladistics 5, 164–166.Google Scholar
  11. 11.
    Vogt, G., Etzold T., and Argos, P. (1995) An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J. Mol. Biol. 249, 816–831.PubMedCrossRefGoogle Scholar
  12. 12.
    Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10,915–10,919.CrossRefGoogle Scholar
  13. 13.
    Dayhoff, M. O., Schwartz, R. M., and Orcutt, B. C. (1978). A model of evolutionary change in proteins. In: (Dayhoff, M. O., ed.) Atlas of Protein Sequence and Structure, National Biomedical Research Foundation, Washington, DC: pp. 345–362.Google Scholar
  14. 14.
    Altschul, S. F., Boguski, M. S., Gish, W., and Wootton, J. C. (1994) Issues in searching molecular sequence databases. Nat. Genet. 6, 119–129.PubMedCrossRefGoogle Scholar
  15. 15.
    Rost B. (2002) Enzyme function less conserved than anticipated. J. Mol. Biol. 318, 595–608.PubMedCrossRefGoogle Scholar
  16. 16.
    Xing L. and Brendel V. (2001) Multi-query sequence BLAST output examination with MuSeq Box. Bioinformatics 17, 744–745.PubMedCrossRefGoogle Scholar
  17. 17.
    Worley K. C., Wiese, B. A., and Smith, R. F. (1995) BEAUTY: an enhanced BLASTbased search tool that integrates multiple biological information resources into sequence similarity search results. Genome Res. 5, 173–184.Google Scholar
  18. 18.
    Brinkman, F. S., Wan, I., Hancock, R. E., Rose, A. M., and Jones, S. J. (2001) PhyloBLAST: facilitating phylogenetic analysis of BLAST results. Bioinformatics 17, 385–387.PubMedCrossRefGoogle Scholar
  19. 19.
    Paquola, A. C., Machado, A. A., Reis, E. M., Da Silva A. M., and Verjovski-Almeida S. (2003) Zerg: a very fast BLAST parser library. Bioinformatics 22, 1035–1036.CrossRefGoogle Scholar
  20. 20.
    Altschul, S. F. and Koonin, E. V. (1998) Iterated profile searches with PSI-BLAST-a tool for discovery in protein databases. Trends Biochem. Sci. 23, 444–447.PubMedCrossRefGoogle Scholar
  21. 21.
    Jones D. T. and Swindells, M. B. (2002) Getting the most from PSI-BLAST. Trends Biochem. Sci. 27, 161–164.PubMedCrossRefGoogle Scholar
  22. 22.
    Mitsuuchi, Y., Johnson, S. W., Sonoda, G., Tanno, S., Golemis, E. A., and Testa, J. R. (1999) Identification of a chromosome 3p14.3-21.1 gene, APPL, encoding an adaptor molecule that interacts with the oncoprotein-serine/threonine kinase AKT2. Oncogene 18, 4891–4898.PubMedCrossRefGoogle Scholar
  23. 23.
    Miaczynska M., Christoforidis S., Giner A., et al. (2004) APPL proteins link Rab5 to nuclear signal transduction via an endosomal compartment. Cell 116, 445–456.PubMedCrossRefGoogle Scholar
  24. 24.
    Peter, B. J., Kent, H. M., Mills, I. G., et al. (2004) BAR domains as sensors of membrane curvature: the amphiphysin BAR structure. Science 303, 495–499.PubMedCrossRefGoogle Scholar
  25. 25.
    Lipman, D. J. and Pearson, W. R. (1985) Rapid and sensitive protein similarity searches. Science 227, 1435–1441.PubMedCrossRefGoogle Scholar
  26. 26.
    Pearson, W. R. and Lipman, D. J. (1998) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448.CrossRefGoogle Scholar
  27. 27.
    Smith, T. and Waterman, M. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.PubMedCrossRefGoogle Scholar
  28. 28.
    Usuka, J., Zhu, W., and Brendel, V. (2000) Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics 16, 203–211.PubMedCrossRefGoogle Scholar
  29. 29.
    Kent, W. J. (2002) BLAT-the BLAST-like alignment tool. Genome Res. 12, 656–664.PubMedGoogle Scholar
  30. 30.
    Pertsemlidis, A. and Fondon, J. W. 3rd. (2001) Having a BLAST with bioinformatics (and avoiding BLASTphemy). Genome Biol. 2, reviews 2002.1–2002.10.CrossRefGoogle Scholar

Copyright information

© Humana Press Inc., Totowa, NJ 2005

Authors and Affiliations

  • Qunfeng Dong
    • 1
  • Volker Brendel
    • 2
  1. 1.Department of Genetics, Development and Cell BiologyIowa State UniversityIA
  2. 2.Department of Genetics, Development and Cell Biology, Department of StatisticsIowa State UniversityIA

Personalised recommendations