Back-Translation for Discovering Distant Protein Homologies
Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins’ common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. To cope with this situation, we propose a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. This allows us to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.
Unable to display preview. Download preview PDF.
- 5.Grantham, R., Gautier, C., Gouy, M., Mercier, R., Pave, A.: Codon catalog usage and the genome hypothesis. Nucleic Acids Research (8), 49–62 (1980)Google Scholar
- 6.Shepherd, J.C.: Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification.. Proceedings National Academy Sciences USA (78), 1596–1600 (1981)Google Scholar
- 7.Guigo, R.: DNA composition, codon usage and exon prediction. Nucleic Protein Databases, 53–80 (1999)Google Scholar
- 14.Arvestad, L.: Algorithms for biological sequence alignment. PhD thesis, Royal Institute of Technology, Stocholm, Numerical Analysis and Computer Science (2000)Google Scholar
- 20.Olsen, R., Bundschuh, R., Hwa, T.: Rapid assessment of extremal statistics for gapped local alignment. In: ISMB, pp. 211–222 (1999)Google Scholar
- 22.Hubbard, T., et al.: Ensembl 2007. Nucleic Acids Res. 35 (2007)Google Scholar