Skip to main content
Log in

The method to compare nucleotide sequences based on the minimum entropy principle

  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

A new method to compare two (or several) symbol sequences is developed. The method is based on the comparison of the frequencies of the small fragments of the compared sequences; it requires neither string editing, nor other transformations of the compared objects. The comparison is executed through a calculation of the specific entropy of a frequency dictionary against the special dictionary called the hybrid one; this latter is the statistical ancestor of the group of sequences under comparison. Some applications of the developed method in the fields of genetics and bioinformatics are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alexandrov, A. A., V. V. Alexandrov, Yu. M. Borodovsky and A. V. Mironov (1990). Computer Analysis of Genetic Texts, Moscow: Nauka.

    Google Scholar 

  • Balesçu, R. (1975). Equilibrium and Nonequilibrium Statistical Mechanics, Vol. xiv, New York: Wiley, pp. 742.

    Google Scholar 

  • Bugaenko, N. N., A. N. Gorban and M. G. Sadovsky (1998). Maximum entropy method in analysis of genetic text and measurement of its information content. Open Syst. Inf. Dyn. 5, 265–281.

    Article  Google Scholar 

  • Bugaenko, N. N., M. G. Sadovsky and A. N. Sapozhnikov (1997). Classification of symbols and implementation of the alphabet optimal for the purposes of a revealing of statistical regularities in a text, in 5th National Conference on “Neuroinformatics and its Applications”, Krasnoyarsk, 22–25 September, 1997, pp. 28–30.

  • Gorban, A. N. (1984). A By-pass of Equilibrium, Novosibirsk: Nauka, pp. 256.

    Google Scholar 

  • Gorban, A. N., T. G. Popova and M. G. Sadovsky (1998). Automatic classification of nucleotide sequences and its relation to natural taxonomy and protein function, in Proceedings of the 1st International Conference on Bioinformatics of Genome Regulation and Structure, Novosibirsk, 24–27 August, 1998; Vol. II, pp. 314–317.

    Google Scholar 

  • Gorban, A. N., T. G. Popova and M. G. Sadovsky (2000). Classification of symbol sequences over their frequency dictionaries: towards the connection between structure and natural taxonomy. Open Syst. Inf. Dyn. 7, 1–17.

    Article  Google Scholar 

  • Gorbunova, E. O., Yu. V. Kondratenko and M. G. Sadovsky (2002a). Implementation of the parallel paradigm of Kirdin kinetic machine for data loss recovery, in Proceedings of the 2nd Workshop “Cluster and Distributed Computations”, Krasnoyarsk, KSTU, pp. 47–53.

  • Gorbunova, E. O., Yu. V. Kondratenko and M. G. Sadovsky (2002b). Reconstruction of the data loss due to Kirdin kinetic machine, in Proceedings of the 10th All-Russian Conference on “Neuroinformatics and its Application”, Krasnoyarsk, A. N. Gorban (Ed.), pp. 46–47.

  • Just, W. (2001). Computational complexity of multiple sequence alignment with SP-score. J. Comput. Biol. 8, 615–623.

    Article  Google Scholar 

  • Kareva, M. V. and M. G. Sadovsky (2001). Entropy methods for some linguistics problems, in Proceedings of the 9th All-Russian Conference on “Neuroinformatics and its Applications”, Krasnoyarsk, A. N. Gorban (Ed.), 5–7 October, 2001, Krasnoyarsk, 2001, pp. 74–75.

  • Kirsanova, E. N. and M. G. Sadovsky (2001). Entropy approach to a comparison of images. Open Syst. Inf. Dyn. 8, 183–199.

    Article  Google Scholar 

  • Sadovsky, M. G. (2002a). Comparison of symbol sequences: no editing, no alignment. Open Syst. Inf. Dyn. 8, 123–132.

    Google Scholar 

  • Sadovsky, M. G. (2002b). Information capacity of symbol sequences. Open Syst. Inf. Dyn. 9, 231–247.

    Google Scholar 

  • Sankoff, D. (1992). Edit distance for genome comparison based on non-local operations. Comb. Pattern Match Lect. Notes Comput. Sci. 644, 121–135.

    Google Scholar 

  • Sankoff, D. and J. H. Nadeau (2000). Comparative Genomics Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignment and the Evolution of Gene Families, Dordrecht: Kluwer Academic Publishers.

    Google Scholar 

  • Seledtzov, I. A., Yu. I. Wulf and K. S. Makarova (1995). Multiple alignment of biopolymer sequences based on the statistically significant common sites. Russian J. Mol. Gen. 29, 1023–1039.

    Google Scholar 

  • Waterman, M. (ed.) (1989). Alignment of Sequences, Boca Raton: CRC Press Inc.

    Google Scholar 

  • Wootton, J. C. and S. Federchen (1996). Alignment of sequences, in Methods of Enzymology, R. F. Doolitle (Ed.), pp. 554–571.

  • Yu, Z. G., V. V. Anh and B. Wang (2001). Correlation property of length sequences based on global structure of the complete genome. Phys. Rev. E 63, 011903.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sadovsky, M.G. The method to compare nucleotide sequences based on the minimum entropy principle. Bull. Math. Biol. 65, 309–322 (2003). https://doi.org/10.1016/S0092-8240(02)00107-6

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1016/S0092-8240(02)00107-6

Keywords

Navigation