Computational Identification of Related Proteins
- 2.1k Downloads
Molecular sequences that share a high degree of similarity often are thought to have evolved from common ancestral genes. Closely related protein sequences will presumably correspond to similar three-dimensional structures and conserved biological functions (although the reverse is not necessarily true: similar structures and conserved functions do not imply that the corresponding protein sequences will be similar; reviewed in ref. 1). These assumptions provide the basis for computational gene annotation. Typically, the first step in characterizing a novel gene is to compare its sequence against known sequences in available databases and to predict its origin and function by copying the annotation of those previously characterized sequences. This approach has been highly successful and is probably the only practical method applicable to large-scale annotation efforts at present. It should be pointed out, however, that this practice is not without its limitations (and is also unsatisfactory from the more theoretical perspective of those who wish to determine structure and function from primary sequence; for a provocative editorial on this subject, see ref. 2). The intrinsic problems of transitive propagation of historical annotation errors have been discussed elsewhere (bi3) and are all too familiar to any biologist who has looked into the databases only to find puzzling annotations that make no sense with current knowledge.
KeywordsQuery Sequence Blast Output Protein Query Sequence Protein Data Bank Database Documentation File
- 3.Brendel, V. (2002) Integration of data management and analysis for genome research. In Schubert, S., Reusch, B., and Jesse, N. (eds.), “Informatik bewegt”. Lecture Notes in Informatics (LNI)—Proceedings P-20, 10–21.Google Scholar
- 10.Felsenstein J. (1989) PHYLIP-Phylogeny Inference Package (Version 3.2). Cladistics 5, 164–166.Google Scholar
- 13.Dayhoff, M. O., Schwartz, R. M., and Orcutt, B. C. (1978). A model of evolutionary change in proteins. In: (Dayhoff, M. O., ed.) Atlas of Protein Sequence and Structure, National Biomedical Research Foundation, Washington, DC: pp. 345–362.Google Scholar
- 17.Worley K. C., Wiese, B. A., and Smith, R. F. (1995) BEAUTY: an enhanced BLASTbased search tool that integrates multiple biological information resources into sequence similarity search results. Genome Res. 5, 173–184.Google Scholar