Abstract
The statistical approach to molecular sequence evolution involves the stochastic modeling of the substitution, insertion and deletion processes. Substitution has been modeled in a reliable way for more than three decades by using finite Markov-processes. Insertion and deletion, however, seem to be more difficult to model, and the recent approaches cannot acceptably deal with multiple insertions and deletions. A new method based on a generating function approach is introduced to describe the multiple insertion process. The presented algorithm computes the approximate joint probability of two sequences in O(l 3) running time where l is the geometric mean of the sequence lengths.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarites in the amino acid sequences of two proteins. J. Mol. Biol. 48 (1970), 443–453.
Bishop, M. J., Thompson, E.A.: Maximum likelihood alignment of DNA sequences. J. Mol. Biol. 190 (1986), 159–165.
Thorne, J.L., Kishino, H., Felsenstein, J.: An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33 (1991), 114–124.
Thorne, J.L., Kishino, H., Felsenstein, J.: Inching toward reality: an improved likelihood model of sequence evolution. J. Mol. Evol. 34 (1992), 3–16.
Hein, J., Wiuf, C., Knudsen, B., Moller, M.B., Wiblig, G.: Statistical alignment: computational properties, homology testing and goodness-of-fit. J. Mol. Biol. 302 (2000), 265–279.
Miklos, I.: Irreversible likelihood models, European Mathematical Genetics Meeting, 20–21. April, 2001, Lille, France.
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model for evolutionary change in proteins, matrices for detecting distant relationships. In: Dayhoff, M.O. (ed.): Atlas of Protein Sequence and Structure, Vol. 5. Cambridge University Press, Washingtown DC. (1978), 343–352.
Tavare, S.: Some probabilistic and statistical problems in the analysis of DNA sequences. Lec. Math. Life Sci. 17 (1986), 57–86.
Feller, W.: An introduction to the probability theory and its applications, Vol. 1. McGraw-Hill, New York (1968), 264–269.
Altschul, S.F.: A protein alignment scoring system sensitive at all evolutionary distances. J. Mol. Evol. 36 (1993), 290–300.
Fleissner, R., Metzler, D., von Haeseler, A.: Can one estimate distances from pairwise sequence alignments? In: Bornberg-Bauer, E., Rost, U., Stoye, J., Vingron, M. (eds) GCB2000, Proceedings of the German Conference on Bioinformatics, Heidelberg (2000), Logos Verlag, Berlin, 89–95.
Hein, J.: Algorithm for statistical alignment of sequences related by a binary tree. In: Altman, R.B., Dunker, A.K., Hunter, L., Lauderdale, K., Klein, T.E. (eds), Pacific Symposium on Biocomputing, World Scientific, Singapore (2001), 179–190.
Hein, J., Jensen, J.L., Pedersen, C.S.N.: Algorithm for statistical multiple alignment. Bioinformatics 2001, Skovde, Sweden.
Durbin, R., Eddy, S., Krogh, A, Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998).
Holmes, I., Bruno, W.J.: Evolutionary HMMs: A Bayesian Approach to Multiple Alignment, Bioinformatics (2001), accepted.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Miklós, I., Toroczkai, Z. (2001). An Improved Model for Statistical Alignment. In: Gascuel, O., Moret, B.M.E. (eds) Algorithms in Bioinformatics. WABI 2001. Lecture Notes in Computer Science, vol 2149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44696-6_1
Download citation
DOI: https://doi.org/10.1007/3-540-44696-6_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42516-8
Online ISBN: 978-3-540-44696-5
eBook Packages: Springer Book Archive