Abstract
An algorithm for computing the probabilities of biological sequences is presented. The algorithm is applicable to many problems of bioinformatics, in particular, computing seed sensitivity in the search for local similarities in genomes or estimating the reliability of search for clusters of regulatory sites. It can be also used for distributions of probabilities described by different models, e.g., Bernoulli, Markov, and hidden Markov models. The algorithm is based on the description of probability distribution as well as of the family of sequences using finite automata, whereby the problem of calculating the probabilities is reduced to computing an appropriate generalized partition function. The algorithm can be applied not only to biological sequences but also to symbol sequences of any origin.
Similar content being viewed by others
Abbreviations
- (D)FA:
-
(deterministic) finite automaton
- GPF:
-
generalized partition function
- HMM:
-
hidden Markov models
- PAA:
-
probabilistic accepting automaton
- PT:
-
probability transducer.
References
M. A. Roytberg, M. N. Simeonenkov, and O. Yu. Tabolina, Biofizika 43, 581 (1998).
V. Boeva, J. Clement, M. Regnier, et al., Algorithms Mol. Biol. 2(1) (2007).
A. V. Finkelstein and M. A. Roytberg, BioSystems 30(1–3), 1 (1993) (spec. vol. Computer Genetics, Ed. by P. A. Pevzner and M. S. Gelfand).
G. Kucherov, L. Noé, and M. A. Roytberg, J. Bioinform. Comput. Biol. 4(2), 553 (2006)
R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids (Cambridge University Press, 1998).
B. A. Trakhtenberg and Ya. M. Barzdin’, Finite Automata. Behavior and Synthesis (Nauka, Moscow, 1970) [in Russian].
M. Li, B. Ma, D. Kisman, and J. Tromp, J. Bioinform. Comput. Biol. 2(3), 417 (2004).
J. Buhler, U. Keich, and Y. Sun, in Proceedings of the 7th Annual International Conference on Computational Molecular Biology (RECOMB03) (ACM Press, Berlin, 2003), pp. 67–75.
B. Brejova, D. Brown, and T. Vinar, in Proceedings of the 14th Symposium on Combinatorial Pattern Matching, Morelia (Mexico), Ed. by M.C.R. Baeza-Yates and E. Chavez (Springer, 2003), pp. 42–54.
G. Kucherov, L. Noé, and M. Roytberg, IEEE/ACM Transact. Comput. Biol. Bioinf. 2, 51 (2005).
A. V. Aho and M. J. Corasick, Commun. ACM 18, 6 (1975).
A. V. Aho, J. E. Hopcroft, and J. Ullman, The Design and Analysis of Computer Algorithms (Addison-Wesley, Reading, 1974).
M. Regnier, Z. Kirakosyan, E. Furletova, and M. Roytberg, in London Algorithmics 2008: Theory and Practice, Ed. by J. Chan, J. W. Daykin, and M. S. Rahman (2009), pp. 10–43.
Author information
Authors and Affiliations
Additional information
Original Russian Text © M.A. Roytberg, 2009, published in Biofizika, 2009, Vol. 54, No. 5, pp. 791–797.
Rights and permissions
About this article
Cite this article
Roytberg, M.A. Computation of the probabilities of families of biological sequences. BIOPHYSICS 54, 569–573 (2009). https://doi.org/10.1134/S0006350909050029
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0006350909050029