Skip to main content
Log in

Flanking monomer repeats determine decreased context complexity of single nucleotide polymorphism sites in the human genome

  • Published:
Russian Journal of Genetics: Applied Research

Abstract

The study of the dependence of the mutation frequency in human genome was conducted by the example of a set of documented single nucleotide polymorphisms (SNPs) from the 1000 genomes project. The tasks of the development of new computer methods for the statistical analysis of genetic texts based on estimations of sequences complexity were considered. The application of the complexity profiles in a sliding window to the analysis of sites containing single nucleotide polymorphisms in the human genome was demonstrated. A local decrease in the text complexity near SNPs was established. Based on the analysis of the complexity profiles in the regions containing SNPs, it was demonstrated that the flanking monomer repeats determine the decreased context complexity of single nucleotide polymorphism sites in human genome. The effect of local decrease in the text complexity level for sequences flanking SNP sites was confirmed for the data on polymorphisms in the rat and mouse genomes. Differences in the context organization for coding and regulatory sequences (that are reflected in the text complexity of nucleotide sequences containing SNPs) were determined. Changes in the point mutation frequencies were previously demonstrated for the sequences containing microsatellites. Using more general mathematical apparatus and more complete data, a saturation of local genomic surroundings containing SNPs with polytracts and simple repeated sequences was demonstrated in this work. Oligonucleotides with increased frequency in the genomic SNP surroundings in human were determined; their association with polytracts was demonstrated. The presence of polytracts can indicate a greater probability of a break in the double DNA strand at this point (resulting in an increased frequency of nucleotide substitutions). The obtained estimations were determined by a previously developed complex of computer programs, which allows us to efficiently determine the frequency spectrum of oligonucleotides with a fixed length, to compare nucleotide frequencies in a larger sample (in addition to estimating the complexity of the phased sequences).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Babenko, V.N., Kosarev, P.S., Vishnevsky, O.V., Levitsky, V.G., Basin, V.V., and Frolov, A.S., Investigating extended regulatory regions of genomic DNA sequences, Bioinformatics, 1999, vol. 15, nos. 7–8, pp. 644–653. doi 10.1093/bioinformatics/15.7.644

    Article  CAS  PubMed  Google Scholar 

  • Babenko, V.N., Matvienko, V.F., and Safronova, N.S., Implication of transposons distribution on chromatin state and genome architecture in human, J. Biomol. Struct. Dyn., 2015, vol. 33, no. 1, pp. 10–11. doi 10.1080/07391102.2015.1032559

    Article  PubMed  Google Scholar 

  • Chuzhanova, N.A., Krawczak, M., Thomas, N., Nemytikova, L.A., Gusev, V.D., and Cooper, D.N., The evolution of the vertebrate beta-globin gene promoter, Evolution, 2002, vol. 56, no. 2, pp. 224–232.

    CAS  PubMed  Google Scholar 

  • Goh, W.S., Orlov, Y., Li, J., and Clarke, N.D., Blurring of high-resolution data shows that the effect of intrinsic nucleosome occupancy on transcription factor binding is mostly regional, not local, PLoS Comput. Biol., 2010, vol. 6, no. 1. doi 10.1371/journal.pcbi.1000649

  • Gusev, V.D., Nemytikova, L.A., and Chuzhanova, N.A., On the complexity measures of genetic sequences, Bioinformatics, 1999, vol. 15, no. 12, pp. 994–999. doi 10.1093/bioinformatics/15.12.994

    Article  CAS  PubMed  Google Scholar 

  • Ignatieva, E.V., Podkolodnaya, O.A., Orlov, Yu.L., Vasiliev, G.V., and Kolchanov, N.A., Regulatory genomics: Combined experimental and computational approaches, Russ. J. Genet., 2015, vol. 51, no. 4, pp. 334–352.

    Article  CAS  Google Scholar 

  • Altshuler, D.M., Gibbs, R.A., Peltonen, L., Dermitzakis, E., Schaffner, S.F., Yu F., Peltonen, L., Dermitzakis, E., Bonnen, P.E., Altshuler, D.M., Gibbs, R.A., de Bakker, P. I., Deloukas, P., Gabriel, S.B., et al., Integrating common and rare genetic variation in diverse human populations, Nature, 2010, vol. 467, no. 7311, pp. 52–58. doi 10.1038/nature09298

    Article  CAS  PubMed  Google Scholar 

  • Karlin, S., Ost, F., and Blaisdell, B.T., Patterns in DNA and amino-acid sequences and their statistical significance, in Mathematical Methods for DNA Sequences, Waterman, M.S., Ed., Boca Raton: CRC Press, 1989.

    Google Scholar 

  • Kulakova, E.V., Spitsina, A.M., Orlova, N.G., Dergilev, A.I., Svichkarev, A.V., Safronova, N.S., Chernykh, I.G., and Orlov, Yu.L., Programs for analysis of genomic sequencing data obtained using technologies ChIP-seq, ChIA-PET, and Hi-C, Program. Sist.: Teor. Prilozh., 2015, vol. 6, no. 2, pp. 129–148.

    Google Scholar 

  • Lenz, C., Haerty, W., and Golding, G.B., Increased substitution rates surrounding low-complexity regions within primate proteins, Genome Biol. Evol., 2014, vol. 6, no. 3, pp. 655–665. doi 10.1093/gbe/evu042

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Medvedeva, S.A., Panchin, A.Y., Alexeevski, A.V., Spirin, S.A., and Panchin, Y.V., Comparative analysis of context-dependent mutagenesis using human and mouse models, BioMed Res. Int., 2013, vol. 2013.

  • Orlov, Yu.L., Analysis of regulatory genomic sequences using computer methods for estimating the complexity of genetic texts, Cand. Sci. (Biol.) Dissertation, Novosibirsk, 2004.

    Google Scholar 

  • Orlov, Y.L., Filippov, V.P., Potapov, V.N., and Kolchanov, N.A., Construction of stochastic context trees for genetic texts, In Silico Biol., 2002, vol. 2, no. 3, pp. 257–262.

    PubMed  Google Scholar 

  • Orlov, Y.L. and Potapov, V.N., Complexity: An Internet resource for analysis of dna sequence complexity, Nucleic Acids Res., 2004, vol. 32, pp. W628–633. doi 10.1093/nar/gkh466

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Orlov, Yu.L., Levitskii, V.G., Smirnova, O.G., Podkolodnaya, O.A., Khlebodarova, T.M., and Kolchanov, N.A., Statistical analysis of DNA sequences containing nucleosome positioning sites, Biophysics, 2006, vol. 51, no. 4, pp. 541–546.

    Article  Google Scholar 

  • Orlov, Y.L., Te Boekhorst, R., and Abnizova, I.I., Statistical measures of the structure of genomic sequences: Entropy, complexity, and position information, J. Bioinf. Comput. Biol., 2006, vol. 4, pp. 523–536. doi 10.1142/S0219720006001801

    Article  CAS  Google Scholar 

  • Orlov, Yu.L., Bragin, A.O., Medvedeva, I.V., Gunbin, I.V., Demenkov, P.S., Vishnevskii, O.V., Levitskii, V.G., Oshchepkov, V.G., Podkolodnyi, N.L., Afonnikov, D.A., Grosse, I., and Kolchanov, N.A., ICGenomics: Software for analysis of character sequences in genomics, Vavilovskii Zh. Genet. Sel., 2012, vol. 16, no. 4/1, pp. 732–741.

    Google Scholar 

  • Polanovski, O.L., Lebedenko, E.N., and Deyev, S.M., ERBB oncogene proteins as targets for monoclonal antibodies, Biochemistry (Moscow), 2012, vol. 77, no. 3, pp. 227–245.

    Article  CAS  Google Scholar 

  • Ponomarenko, M., Mironova, V., Gunbin, K., and Savinkova, L., Hogness Box, in Brenner’s Encyclopedia of Genetics, Maloy, S. and Hughes, K., Eds., San Diego: Acad. Press, Elsevier Inc, 2013a, vol. 3, pp. 491–494. doi 10.1016/B978-0-12-374984-0.00720-8

    Article  Google Scholar 

  • Ponomarenko, M., Savinkova, L., and Kolchanov, N., Initiation Factors, in Brenner’s Encyclopedia of Genetics, Maloy, S. and Hughes, K., Eds., San Diego: Acad. Press, Elsevier Inc, 2013b, vol. 4, pp. 83–85. doi 10.1016/B978-0-12-374984-0.00798-1

    Article  Google Scholar 

  • Ponomarenko, J.V., Orlova, G.V., Merkulova, T.I., Gorshkova, E.V., Fokin, O.N., Vasiliev, G.V., Frolov, A.S., and Ponomarenko, M.P., rSNP_Guide: An integrated database-tools system for studying SNPs and site-directed mutations in transcription factor binding sites, Hum. Mutat., 2002, vol. 20, no. 4, pp. 239–248. doi 10.1002/humu.10116

    Article  CAS  PubMed  Google Scholar 

  • Ponomarenko, P.M., Savinkova, L.K., Drachkova, I.A., Lysova, M.V., Arshinova, T.V., Ponomarenko, M.P., and Kolchanov, N.A., A step-by-step model of TBP/TATA box binding allows predicting human hereditary diseases by single nucleotide polymorphism, Dokl. Biochem. Biophys., 2008, vol. 419, no. 1, pp. 88–92.

    Article  CAS  PubMed  Google Scholar 

  • Putta, P., Orlov, Y.L., Podkolodnyy, N.L., and Mitra, C.K., Relatively conserved common short sequences in transcription factor binding sites and miRNA, Russ. J. Genet., Appl. Res., 2012, vol. 2, no. 3, pp. 238–242. doi 10.1134/S2079059712030094

    Article  Google Scholar 

  • Rogozin, I.B., Solovyov, V.V., and Kolchanov, N.A., Somatic hypermutagenesis in immunoglobulin genes. I. Correlation between somatic mutations and repeats. Somatic mutation properties and clonal selection, Biochim. Biophys. Acta, 1991, vol. 1089, no. 2, pp. 175–182. doi 10.1016/0167-4781(91)90005-7

    Article  CAS  PubMed  Google Scholar 

  • Rogozin, I.B. and Kolchanov, N.A., Somatic hypermutagenesis in immunoglobulin genes. II. Influence of neighbouring base sequences on mutagenesis, Biochim. Biophys. Acta, 1992, vol. 1171, no. 1, pp. 11–18. doi 10.1016/0167-4781(92)90134-L

    Article  CAS  PubMed  Google Scholar 

  • Rogozin, I.B., Pavlov, Y.I., Bebenek, K., Matsuda, T., and Kunkel, T.A., Somatic mutation hotspots correlate with DNA polymerase eta error spectrum, Nat. Immunol., 2001, vol. 2, no. 6, pp. 530–536. doi 10.1038/88732

    Article  CAS  PubMed  Google Scholar 

  • Safronova, N.S., Babenko, V.N., and Orlov, Y.L., 117 Analysis of SNP containing sites in human genome using text complexity estimates, J. Biomol. Struct. Dyn., 2015, vol. 33, no. 1, pp. 73–74. doi 10.1080/07391102.2015.1032750

    Article  PubMed  Google Scholar 

  • Savinkova, L.K., Ponomarenko, M.P., Ponomarenko, P.M., Drachkova, I.A., Lysova, M.V., Arshinova, T.V., and Kolchanov, N.A., TATA box polymorphisms in human gene promoters and associated hereditary pathologies, Biochemistry (Moscow), 2009, vol. 74, no. 2, pp. 117–129.

    Article  CAS  Google Scholar 

  • Siddle, K.J., Goodship, J.A., Keavney, B., and Santibanez-Koref, M.F., Bases adjacent to mononucleotide repeats show an increased single nucleotide polymorphism frequency in the human genome, Bioinformatics, 2011, vol. 27, no. 7, pp. 895–898. doi 10.1093/bioinformatics/btr067

    Article  CAS  PubMed  Google Scholar 

  • Sidore, C., Busonero, F., Maschio, A., Porcu, E., Naitza, S., Zoledziewska, M., Mulas, A., Pistis, G., Steri, M., Danjou, F., Kwong, A., Ortega Del Vecchyo, V.D., Chiang, C.W., Bragg-Gresham, J., Pitzalis, M., et al., Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers, Nat. Genet., 2015, vol. 47, no. 11, pp. 1272–1281. doi 10.1038/ng.3368

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Spitsina, A.M., Orlov, Yu.L., Podkolodnaya, N.N., Svichkarev, A.V., Dergilev, A.I., Chen, M., Kuchin, N.V., Chernykh, I.G., and Glinskii, B.M., Supercomputer analysis of genomic and transcriptomic data obtained using highthroughput DNA sequencing, Program. Sist.: Teor. Prilozh., 2015, vol. 6, no. 23, pp. 157–174.

    Google Scholar 

  • Trifonov, E.N., Volkovich, Z., and Frenkel, Z.M., Multiple levels of meaning in DNA sequences, and one more, Ann. N. Y. Acad. Sci., 2012, vol. 1267, pp. 35–38. doi 10.1111/j.1749-6632.2012.06589.x

    Article  CAS  PubMed  Google Scholar 

  • Troyanskaya, O.G., Arbell, O., Koren, Y., Landau, G.M., and Bolshoy, A., Sequence complexity profiles of prokaryotic genomic sequences: A fast algorithm for calculating linguistic complexity, Bioinformatics, 2002, vol. 18, no. 5, pp. 679–688. doi 10.1093/bioinformatics/18.5.679

    Article  CAS  PubMed  Google Scholar 

  • UK10K Consortium, Walter, K., Min, J.L., Huang, J., Crooks, L., Memari, Y., McCarthy, S., Perry, J.R., Xu, C., Futema, M., Lawson, D., Iotchkova, V., Schiffels, S., Hendricks, A.E., et al., The UK10K project identifies rare variants in health and disease, Nature, 2015, vol. 526, pp. 82–90. doi 10.1038/nature14962

    Article  Google Scholar 

  • Vowles, E.J. and Amos, W., Evidence for widespread convergent evolution around human microsatellites, PLoS Biol., 2004, vol. 2. doi 10.1371/journal.pbio.0020199

  • Wootton, J.C. and Federhen, S., Analysis of compositionally biased regions in sequence databases, Methods Enzymol., 1996, vol. 266, pp. 554–571. doi 10.1016/S0076-6879(96)66035-2

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. S. Safronova.

Additional information

Original Russian Text © N.S. Safronova, M.P. Ponomarenko, I.I. Abnizova, G.V. Orlova, I.V. Chadaeva, Y.L. Orlov, 2015, published in Vavilovskii Zhurnal Genetiki i Selektsii, 2015, Vol. 19, No. 6, pp. 668–674.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Safronova, N.S., Ponomarenko, M.P., Abnizova, I.I. et al. Flanking monomer repeats determine decreased context complexity of single nucleotide polymorphism sites in the human genome. Russ J Genet Appl Res 6, 809–815 (2016). https://doi.org/10.1134/S2079059716070121

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S2079059716070121

Keywords

Navigation