Skip to main content
Log in

Computation of the probabilities of families of biological sequences

  • Molecular Biophysics
  • Published:
Biophysics Aims and scope Submit manuscript

Abstract

An algorithm for computing the probabilities of biological sequences is presented. The algorithm is applicable to many problems of bioinformatics, in particular, computing seed sensitivity in the search for local similarities in genomes or estimating the reliability of search for clusters of regulatory sites. It can be also used for distributions of probabilities described by different models, e.g., Bernoulli, Markov, and hidden Markov models. The algorithm is based on the description of probability distribution as well as of the family of sequences using finite automata, whereby the problem of calculating the probabilities is reduced to computing an appropriate generalized partition function. The algorithm can be applied not only to biological sequences but also to symbol sequences of any origin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

(D)FA:

(deterministic) finite automaton

GPF:

generalized partition function

HMM:

hidden Markov models

PAA:

probabilistic accepting automaton

PT:

probability transducer.

References

  1. M. A. Roytberg, M. N. Simeonenkov, and O. Yu. Tabolina, Biofizika 43, 581 (1998).

    Google Scholar 

  2. V. Boeva, J. Clement, M. Regnier, et al., Algorithms Mol. Biol. 2(1) (2007).

  3. A. V. Finkelstein and M. A. Roytberg, BioSystems 30(1–3), 1 (1993) (spec. vol. Computer Genetics, Ed. by P. A. Pevzner and M. S. Gelfand).

    Article  Google Scholar 

  4. G. Kucherov, L. Noé, and M. A. Roytberg, J. Bioinform. Comput. Biol. 4(2), 553 (2006)

    Article  Google Scholar 

  5. R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids (Cambridge University Press, 1998).

  6. B. A. Trakhtenberg and Ya. M. Barzdin’, Finite Automata. Behavior and Synthesis (Nauka, Moscow, 1970) [in Russian].

    Google Scholar 

  7. M. Li, B. Ma, D. Kisman, and J. Tromp, J. Bioinform. Comput. Biol. 2(3), 417 (2004).

    Article  Google Scholar 

  8. J. Buhler, U. Keich, and Y. Sun, in Proceedings of the 7th Annual International Conference on Computational Molecular Biology (RECOMB03) (ACM Press, Berlin, 2003), pp. 67–75.

    Google Scholar 

  9. B. Brejova, D. Brown, and T. Vinar, in Proceedings of the 14th Symposium on Combinatorial Pattern Matching, Morelia (Mexico), Ed. by M.C.R. Baeza-Yates and E. Chavez (Springer, 2003), pp. 42–54.

  10. G. Kucherov, L. Noé, and M. Roytberg, IEEE/ACM Transact. Comput. Biol. Bioinf. 2, 51 (2005).

    Article  Google Scholar 

  11. A. V. Aho and M. J. Corasick, Commun. ACM 18, 6 (1975).

    MathSciNet  Google Scholar 

  12. A. V. Aho, J. E. Hopcroft, and J. Ullman, The Design and Analysis of Computer Algorithms (Addison-Wesley, Reading, 1974).

    MATH  Google Scholar 

  13. M. Regnier, Z. Kirakosyan, E. Furletova, and M. Roytberg, in London Algorithmics 2008: Theory and Practice, Ed. by J. Chan, J. W. Daykin, and M. S. Rahman (2009), pp. 10–43.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Original Russian Text © M.A. Roytberg, 2009, published in Biofizika, 2009, Vol. 54, No. 5, pp. 791–797.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roytberg, M.A. Computation of the probabilities of families of biological sequences. BIOPHYSICS 54, 569–573 (2009). https://doi.org/10.1134/S0006350909050029

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0006350909050029

Key words

Navigation