Biochemistry (Moscow)

, Volume 83, Issue 2, pp 129–139 | Cite as

Comparison of Methods of Detection of Exceptional Sequences in Prokaryotic Genomes

  • I. S. Rusinov
  • A. S. Ershova
  • A. S. Karyagina
  • S. A. Spirin
  • A. V. AlexeevskiEmail author


Many proteins need recognition of specific DNA sequences for functioning. The number of recognition sites and their distribution along the DNA might be of biological importance. For example, the number of restriction sites is often reduced in prokaryotic and phage genomes to decrease the probability of DNA cleavage by restriction endonucleases. We call a sequence an exceptional one if its frequency in a genome significantly differs from one predicted by some mathematical model. An exceptional sequence could be either under- or over-represented, depending on its frequency in comparison with the predicted one. Exceptional sequences could be considered biologically meaningful, for example, as targets of DNA-binding proteins or as parts of abundant repetitive elements. Several methods to predict frequency of a short sequence in a genome, based on actual frequencies of certain its subsequences, are used. The most popular are methods based on Markov chain models. But any rigorous comparison of the methods has not previously been performed. We compared three methods for the prediction of short sequence frequencies: the maximum-order Markov chain model-based method, the method that uses geometric mean of extended Markovian estimates, and the method that utilizes frequencies of all subsequences including discontiguous ones. We applied them to restriction sites in complete genomes of 2500 prokaryotic species and demonstrated that the results depend greatly on the method used: lists of 5% of the most under-represented sites differed by up to 50%. The method designed by Burge and coauthors in 1992, which utilizes all subsequences of the sequence, showed a higher precision than the other two methods both on prokaryotic genomes and randomly generated sequences after computational imitation of selective pressure. We propose this method as the first choice for detection of exceptional sequences in prokaryotic genomes.


DNA sequence prokaryotic genome compositional bias Markov chain model restriction–modification system restriction sites 

Abbreviation (biaoti)


method of Burge and coauthors


base pairs


compositional bias


method based on Markov chain model of maximum order


method of Pevzner and coauthors

R-M system

restriction–modification system.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adams, J., and Rothman, E. D. (1982) Estimation of phylogenetic relationships from DNA restriction patterns and selection of endonuclease cleavage sites, Proc. Natl. Acad. Sci. USA, 79, 3560–3564.CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Yi, S. V., and Goodisman, M. A. (2009) Computational approaches for understanding the evolution of DNA methylation in animals, Epigenetics, 4, 551–556.CrossRefPubMedGoogle Scholar
  3. 3.
    Bertani, G., and Weigle, J. J. (1953) Host controlled variation in bacterial viruses, J. Bacteriol., 65, 113–121.PubMedPubMedCentralGoogle Scholar
  4. 4.
    Dillingham, M. S., and Kowalczykowski, S. C. (2008) RecBCD enzyme and the repair of double-stranded DNA breaks, Microbiol. Mol. Biol. Rev., 72, 642–671.CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Levy, A., Goren, M. G., Yosef, I., Auster, O., Manor, M., Amitai, G., Edgar, R., Qimron, U., and Sorek, R. (2015) CRISPR adaptation biases explain preference for acquisition of foreign DNA, Nature, 520, 505–510.CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Kawamura, F., Mizukami, T., Shimotsu, H., Anzai, H., Takahashi, H., and Saito, H. (1981) Unusually infrequent cleavage with several endonucleases and physical map construction of Bacillus subtilis bacteriophage phi 1 DNA, J. Virol., 37, 1099–1102.PubMedPubMedCentralGoogle Scholar
  7. 7.
    Elton, R. A. (1974) Sites of cleavage by restriction enzymes in viral DNAs: comparison of observed and expected frequencies, Nucleic Acids Res., 1, 1343–1350.CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Brendel, V., Beckmann, J. S., and Trifonov, E. N. (1986) Linguistics of nucleotide sequences: morphology and comparison of vocabularies, J. Biomol. Struct. Dyn., 4, 11–21.CrossRefPubMedGoogle Scholar
  9. 9.
    Deschavanne, P., and Radman, M. (1991) Counterselection of GATC sequences in enterobacteriophages by the components of the methyl-directed mismatch repair system, J. Mol. Evol., 33, 125–132.CrossRefPubMedGoogle Scholar
  10. 10.
    Bhagwat, A. S., and McClelland, M. (1992) DNA mismatch correction by Very Short Patch repair may have altered the abundance of oligonucleotides in the E. coli genome, Nucleic Acids Res., 20, 1663–1668.CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Schbath, S., Prum, B., and de Turckheim, E. (1995) Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences, J. Comput. Biol., 2, 417–437.CrossRefPubMedGoogle Scholar
  12. 12.
    Merkl, R., and Fritz, H. J. (1996) Statistical evidence for a biochemical pathway of natural, sequence-targeted G/C to C/G transversion mutagenesis in Haemophilus influenzae Rd, Nucleic Acids Res., 24, 4146–4151.CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Gelfand, M. S., and Koonin, E. V. (1997) Avoidance of palindromic words in bacterial and archaeal genomes: a close connection with restriction enzymes, Nucleic Acids Res., 25, 2430–2439.CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Rocha, E. P., Danchin, A., and Viari, A. (2001) Evolutionary role of restriction/modification systems as revealed by comparative genome analysis, Genome Res., 11, 946–958.CrossRefPubMedGoogle Scholar
  15. 15.
    Fuglsang, A. (2003) Distribution of potential Type II restriction sites (palindromes) in prokaryotes, Biochem. Biophys. Res. Commun., 310, 280–285.CrossRefPubMedGoogle Scholar
  16. 16.
    Pevzner, P. A., Borodovsky, M. Yu., and Mironov, A. A. (1989) Linguistics of nucleotide sequences. I. The significance of deviations from mean statistical characteristics and prediction of the frequencies of occurrence of words, J. Biomol. Struct. Dyn., 6, 1013–1026.CrossRefPubMedGoogle Scholar
  17. 17.
    Burge, C., Campbell, A. M., and Karlin, S. (1992) Over-and under-representation of short oligonucleotides in DNA sequences, Proc. Natl. Acad. Sci. USA, 89, 1358–1362.CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Elhai, J. (2001) Determination of bias in the relative abundance of oligonucleotides in DNA sequences, J. Comput. Biol., 8, 151–175.CrossRefPubMedGoogle Scholar
  19. 19.
    Roberts, R. J., Vincze, T., Posfai, J., and Macelis, D. (2015) REBASE–a database for DNA restriction and modification: enzymes, genes and genomes, Nucleic Acids Res., 43, 298–299.CrossRefGoogle Scholar
  20. 20.
    Roberts, R. J., Belfort, M., Bestor, T., Bhagwat, A. S., Bickle, T. A., Bitinaite, J., Blumenthal, R. M., Degtyarev, S. Kh., Dryden, D. T., Dybvig, K., Firman, K., Gromova, E. S., Gumport, R. I., Halford, S. E., Hattman, S., Heitman, J., Hornby, D. P., Janulaitis, A., Jeltsch, A., Josephsen, J., Kiss, A., Klaenhammer, T. R., Kobayashi, I., Kong, H., Kruger, D. H., Lacks, S., Marinus, M. G., Miyahara, M., Morgan, R. D., Murray, N. E., Nagaraja, V., Piekarowicz, A., Pingoud, A., Raleigh, E., Rao, D. N., Reich, N., Repin, V. E., Selker, E. U., Shaw, P. C., Stein, D. C., Stoddard, B. L., Szybalski, W., Trautner, T. A., Van Etten, J. L., Vitor, J. M., Wilson, G. G., and Xu, S. Y. (2003) A nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases and their genes, Nucleic Acids Res., 31, 1805–1812.CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Rusinov, I. S., Ershova, A. S., Karyagina, A. S., Spirin, S. A., and Alexeevski, A. V. (2015) Lifespan of restriction-modification systems critically affects avoidance of their recognition sites in host genomes, BMC Genomics, 16, 1084.CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Karlin, S., and Cardon, L. R. (1994) Computational DNA sequence analysis, Annu. Rev. Microbiol., 48, 619–654.CrossRefPubMedGoogle Scholar
  23. 23.
    Karlin, S., Mrazek, J., and Campbell, A. M. (1997) Compositional biases of bacterial genomes and evolutionary implications, J. Bacteriol., 179, 3899–3913.CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Ershova, A., Rusinov, I., Vasiliev, M., Spirin, S., and Karyagina, A. (2016) Restriction-modification systems interplay causes avoidance of GATC site in prokaryotic genomes, J. Bioinform. Comput. Biol., 14, 1641003.CrossRefPubMedGoogle Scholar
  25. 25.
    Zheng, Y., Posfai, J., Morgan, R. D., Vincze, T., and Roberts, R. J. (2009) Using shotgun sequence data to find active restriction enzyme genes, Nucleic Acids Res., 37, 1.CrossRefGoogle Scholar

Copyright information

© Pleiades Publishing, Ltd. 2018

Authors and Affiliations

  • I. S. Rusinov
    • 1
  • A. S. Ershova
    • 1
    • 2
    • 3
  • A. S. Karyagina
    • 1
    • 2
    • 3
  • S. A. Spirin
    • 1
    • 4
    • 5
  • A. V. Alexeevski
    • 1
    • 4
    • 5
    Email author
  1. 1.Belozersky Institute of Physico-Chemical BiologyLomonosov Moscow State UniversityMoscowRussia
  2. 2.Gamaleya National Research Center of Epidemiology and MicrobiologyMinistry of Health of the Russian FederationMoscowRussia
  3. 3.All-Russia Research Institute of Agricultural BiotechnologyMoscowRussia
  4. 4.Institute of System StudiesMoscowRussia
  5. 5.Faculty of Bioengineering and BioinformaticsLomonosov Moscow State UniversityMoscowRussia

Personalised recommendations