Flanking monomer repeats determine decreased context complexity of single nucleotide polymorphism sites in the human genome

Safronova, N. S.; Ponomarenko, M. P.; Abnizova, I. I.; Orlova, G. V.; Chadaeva, I. V.; Orlov, Y. L.

doi:10.1134/S2079059716070121

Flanking monomer repeats determine decreased context complexity of single nucleotide polymorphism sites in the human genome

Published: 18 December 2016

Volume 6, pages 809–815, (2016)
Cite this article

Russian Journal of Genetics: Applied Research

N. S. Safronova^1,2,
M. P. Ponomarenko^1,2,
I. I. Abnizova³,
G. V. Orlova¹,
I. V. Chadaeva¹ &
…
Y. L. Orlov^1,2

17 Accesses
Explore all metrics

Abstract

The study of the dependence of the mutation frequency in human genome was conducted by the example of a set of documented single nucleotide polymorphisms (SNPs) from the 1000 genomes project. The tasks of the development of new computer methods for the statistical analysis of genetic texts based on estimations of sequences complexity were considered. The application of the complexity profiles in a sliding window to the analysis of sites containing single nucleotide polymorphisms in the human genome was demonstrated. A local decrease in the text complexity near SNPs was established. Based on the analysis of the complexity profiles in the regions containing SNPs, it was demonstrated that the flanking monomer repeats determine the decreased context complexity of single nucleotide polymorphism sites in human genome. The effect of local decrease in the text complexity level for sequences flanking SNP sites was confirmed for the data on polymorphisms in the rat and mouse genomes. Differences in the context organization for coding and regulatory sequences (that are reflected in the text complexity of nucleotide sequences containing SNPs) were determined. Changes in the point mutation frequencies were previously demonstrated for the sequences containing microsatellites. Using more general mathematical apparatus and more complete data, a saturation of local genomic surroundings containing SNPs with polytracts and simple repeated sequences was demonstrated in this work. Oligonucleotides with increased frequency in the genomic SNP surroundings in human were determined; their association with polytracts was demonstrated. The presence of polytracts can indicate a greater probability of a break in the double DNA strand at this point (resulting in an increased frequency of nucleotide substitutions). The obtained estimations were determined by a previously developed complex of computer programs, which allows us to efficiently determine the frequency spectrum of oligonucleotides with a fixed length, to compare nucleotide frequencies in a larger sample (in addition to estimating the complexity of the phased sequences).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bioinformatics tools for the sequence complexity estimates

Article 15 September 2023

Selective intra-dinucleotide interactions and periodicities of bases separated by K sites: a new vision and tool for phylogeny analyses

Article Open access 13 February 2017

The exceptional genomic word symmetry along DNA sequences

Article Open access 03 February 2016

References

Babenko, V.N., Kosarev, P.S., Vishnevsky, O.V., Levitsky, V.G., Basin, V.V., and Frolov, A.S., Investigating extended regulatory regions of genomic DNA sequences, Bioinformatics, 1999, vol. 15, nos. 7–8, pp. 644–653. doi 10.1093/bioinformatics/15.7.644
Article CAS PubMed Google Scholar
Babenko, V.N., Matvienko, V.F., and Safronova, N.S., Implication of transposons distribution on chromatin state and genome architecture in human, J. Biomol. Struct. Dyn., 2015, vol. 33, no. 1, pp. 10–11. doi 10.1080/07391102.2015.1032559
Article PubMed Google Scholar
Chuzhanova, N.A., Krawczak, M., Thomas, N., Nemytikova, L.A., Gusev, V.D., and Cooper, D.N., The evolution of the vertebrate beta-globin gene promoter, Evolution, 2002, vol. 56, no. 2, pp. 224–232.
CAS PubMed Google Scholar
Goh, W.S., Orlov, Y., Li, J., and Clarke, N.D., Blurring of high-resolution data shows that the effect of intrinsic nucleosome occupancy on transcription factor binding is mostly regional, not local, PLoS Comput. Biol., 2010, vol. 6, no. 1. doi 10.1371/journal.pcbi.1000649
Gusev, V.D., Nemytikova, L.A., and Chuzhanova, N.A., On the complexity measures of genetic sequences, Bioinformatics, 1999, vol. 15, no. 12, pp. 994–999. doi 10.1093/bioinformatics/15.12.994
Article CAS PubMed Google Scholar
Ignatieva, E.V., Podkolodnaya, O.A., Orlov, Yu.L., Vasiliev, G.V., and Kolchanov, N.A., Regulatory genomics: Combined experimental and computational approaches, Russ. J. Genet., 2015, vol. 51, no. 4, pp. 334–352.
Article CAS Google Scholar
Altshuler, D.M., Gibbs, R.A., Peltonen, L., Dermitzakis, E., Schaffner, S.F., Yu F., Peltonen, L., Dermitzakis, E., Bonnen, P.E., Altshuler, D.M., Gibbs, R.A., de Bakker, P. I., Deloukas, P., Gabriel, S.B., et al., Integrating common and rare genetic variation in diverse human populations, Nature, 2010, vol. 467, no. 7311, pp. 52–58. doi 10.1038/nature09298
Article CAS PubMed Google Scholar
Karlin, S., Ost, F., and Blaisdell, B.T., Patterns in DNA and amino-acid sequences and their statistical significance, in Mathematical Methods for DNA Sequences, Waterman, M.S., Ed., Boca Raton: CRC Press, 1989.
Google Scholar
Kulakova, E.V., Spitsina, A.M., Orlova, N.G., Dergilev, A.I., Svichkarev, A.V., Safronova, N.S., Chernykh, I.G., and Orlov, Yu.L., Programs for analysis of genomic sequencing data obtained using technologies ChIP-seq, ChIA-PET, and Hi-C, Program. Sist.: Teor. Prilozh., 2015, vol. 6, no. 2, pp. 129–148.
Google Scholar
Lenz, C., Haerty, W., and Golding, G.B., Increased substitution rates surrounding low-complexity regions within primate proteins, Genome Biol. Evol., 2014, vol. 6, no. 3, pp. 655–665. doi 10.1093/gbe/evu042
Article CAS PubMed PubMed Central Google Scholar
Medvedeva, S.A., Panchin, A.Y., Alexeevski, A.V., Spirin, S.A., and Panchin, Y.V., Comparative analysis of context-dependent mutagenesis using human and mouse models, BioMed Res. Int., 2013, vol. 2013.
Orlov, Yu.L., Analysis of regulatory genomic sequences using computer methods for estimating the complexity of genetic texts, Cand. Sci. (Biol.) Dissertation, Novosibirsk, 2004.
Google Scholar
Orlov, Y.L., Filippov, V.P., Potapov, V.N., and Kolchanov, N.A., Construction of stochastic context trees for genetic texts, In Silico Biol., 2002, vol. 2, no. 3, pp. 257–262.
PubMed Google Scholar
Orlov, Y.L. and Potapov, V.N., Complexity: An Internet resource for analysis of dna sequence complexity, Nucleic Acids Res., 2004, vol. 32, pp. W628–633. doi 10.1093/nar/gkh466
Article CAS PubMed PubMed Central Google Scholar
Orlov, Yu.L., Levitskii, V.G., Smirnova, O.G., Podkolodnaya, O.A., Khlebodarova, T.M., and Kolchanov, N.A., Statistical analysis of DNA sequences containing nucleosome positioning sites, Biophysics, 2006, vol. 51, no. 4, pp. 541–546.
Article Google Scholar
Orlov, Y.L., Te Boekhorst, R., and Abnizova, I.I., Statistical measures of the structure of genomic sequences: Entropy, complexity, and position information, J. Bioinf. Comput. Biol., 2006, vol. 4, pp. 523–536. doi 10.1142/S0219720006001801
Article CAS Google Scholar
Orlov, Yu.L., Bragin, A.O., Medvedeva, I.V., Gunbin, I.V., Demenkov, P.S., Vishnevskii, O.V., Levitskii, V.G., Oshchepkov, V.G., Podkolodnyi, N.L., Afonnikov, D.A., Grosse, I., and Kolchanov, N.A., ICGenomics: Software for analysis of character sequences in genomics, Vavilovskii Zh. Genet. Sel., 2012, vol. 16, no. 4/1, pp. 732–741.
Google Scholar
Polanovski, O.L., Lebedenko, E.N., and Deyev, S.M., ERBB oncogene proteins as targets for monoclonal antibodies, Biochemistry (Moscow), 2012, vol. 77, no. 3, pp. 227–245.
Article CAS Google Scholar
Ponomarenko, M., Mironova, V., Gunbin, K., and Savinkova, L., Hogness Box, in Brenner’s Encyclopedia of Genetics, Maloy, S. and Hughes, K., Eds., San Diego: Acad. Press, Elsevier Inc, 2013a, vol. 3, pp. 491–494. doi 10.1016/B978-0-12-374984-0.00720-8
Article Google Scholar
Ponomarenko, M., Savinkova, L., and Kolchanov, N., Initiation Factors, in Brenner’s Encyclopedia of Genetics, Maloy, S. and Hughes, K., Eds., San Diego: Acad. Press, Elsevier Inc, 2013b, vol. 4, pp. 83–85. doi 10.1016/B978-0-12-374984-0.00798-1
Article Google Scholar
Ponomarenko, J.V., Orlova, G.V., Merkulova, T.I., Gorshkova, E.V., Fokin, O.N., Vasiliev, G.V., Frolov, A.S., and Ponomarenko, M.P., rSNP_Guide: An integrated database-tools system for studying SNPs and site-directed mutations in transcription factor binding sites, Hum. Mutat., 2002, vol. 20, no. 4, pp. 239–248. doi 10.1002/humu.10116
Article CAS PubMed Google Scholar
Ponomarenko, P.M., Savinkova, L.K., Drachkova, I.A., Lysova, M.V., Arshinova, T.V., Ponomarenko, M.P., and Kolchanov, N.A., A step-by-step model of TBP/TATA box binding allows predicting human hereditary diseases by single nucleotide polymorphism, Dokl. Biochem. Biophys., 2008, vol. 419, no. 1, pp. 88–92.
Article CAS PubMed Google Scholar
Putta, P., Orlov, Y.L., Podkolodnyy, N.L., and Mitra, C.K., Relatively conserved common short sequences in transcription factor binding sites and miRNA, Russ. J. Genet., Appl. Res., 2012, vol. 2, no. 3, pp. 238–242. doi 10.1134/S2079059712030094
Article Google Scholar
Rogozin, I.B., Solovyov, V.V., and Kolchanov, N.A., Somatic hypermutagenesis in immunoglobulin genes. I. Correlation between somatic mutations and repeats. Somatic mutation properties and clonal selection, Biochim. Biophys. Acta, 1991, vol. 1089, no. 2, pp. 175–182. doi 10.1016/0167-4781(91)90005-7
Article CAS PubMed Google Scholar
Rogozin, I.B. and Kolchanov, N.A., Somatic hypermutagenesis in immunoglobulin genes. II. Influence of neighbouring base sequences on mutagenesis, Biochim. Biophys. Acta, 1992, vol. 1171, no. 1, pp. 11–18. doi 10.1016/0167-4781(92)90134-L
Article CAS PubMed Google Scholar
Rogozin, I.B., Pavlov, Y.I., Bebenek, K., Matsuda, T., and Kunkel, T.A., Somatic mutation hotspots correlate with DNA polymerase eta error spectrum, Nat. Immunol., 2001, vol. 2, no. 6, pp. 530–536. doi 10.1038/88732
Article CAS PubMed Google Scholar
Safronova, N.S., Babenko, V.N., and Orlov, Y.L., 117 Analysis of SNP containing sites in human genome using text complexity estimates, J. Biomol. Struct. Dyn., 2015, vol. 33, no. 1, pp. 73–74. doi 10.1080/07391102.2015.1032750
Article PubMed Google Scholar
Savinkova, L.K., Ponomarenko, M.P., Ponomarenko, P.M., Drachkova, I.A., Lysova, M.V., Arshinova, T.V., and Kolchanov, N.A., TATA box polymorphisms in human gene promoters and associated hereditary pathologies, Biochemistry (Moscow), 2009, vol. 74, no. 2, pp. 117–129.
Article CAS Google Scholar
Siddle, K.J., Goodship, J.A., Keavney, B., and Santibanez-Koref, M.F., Bases adjacent to mononucleotide repeats show an increased single nucleotide polymorphism frequency in the human genome, Bioinformatics, 2011, vol. 27, no. 7, pp. 895–898. doi 10.1093/bioinformatics/btr067
Article CAS PubMed Google Scholar
Sidore, C., Busonero, F., Maschio, A., Porcu, E., Naitza, S., Zoledziewska, M., Mulas, A., Pistis, G., Steri, M., Danjou, F., Kwong, A., Ortega Del Vecchyo, V.D., Chiang, C.W., Bragg-Gresham, J., Pitzalis, M., et al., Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers, Nat. Genet., 2015, vol. 47, no. 11, pp. 1272–1281. doi 10.1038/ng.3368
Article CAS PubMed PubMed Central Google Scholar
Spitsina, A.M., Orlov, Yu.L., Podkolodnaya, N.N., Svichkarev, A.V., Dergilev, A.I., Chen, M., Kuchin, N.V., Chernykh, I.G., and Glinskii, B.M., Supercomputer analysis of genomic and transcriptomic data obtained using highthroughput DNA sequencing, Program. Sist.: Teor. Prilozh., 2015, vol. 6, no. 23, pp. 157–174.
Google Scholar
Trifonov, E.N., Volkovich, Z., and Frenkel, Z.M., Multiple levels of meaning in DNA sequences, and one more, Ann. N. Y. Acad. Sci., 2012, vol. 1267, pp. 35–38. doi 10.1111/j.1749-6632.2012.06589.x
Article CAS PubMed Google Scholar
Troyanskaya, O.G., Arbell, O., Koren, Y., Landau, G.M., and Bolshoy, A., Sequence complexity profiles of prokaryotic genomic sequences: A fast algorithm for calculating linguistic complexity, Bioinformatics, 2002, vol. 18, no. 5, pp. 679–688. doi 10.1093/bioinformatics/18.5.679
Article CAS PubMed Google Scholar
UK10K Consortium, Walter, K., Min, J.L., Huang, J., Crooks, L., Memari, Y., McCarthy, S., Perry, J.R., Xu, C., Futema, M., Lawson, D., Iotchkova, V., Schiffels, S., Hendricks, A.E., et al., The UK10K project identifies rare variants in health and disease, Nature, 2015, vol. 526, pp. 82–90. doi 10.1038/nature14962
Article Google Scholar
Vowles, E.J. and Amos, W., Evidence for widespread convergent evolution around human microsatellites, PLoS Biol., 2004, vol. 2. doi 10.1371/journal.pbio.0020199
Wootton, J.C. and Federhen, S., Analysis of compositionally biased regions in sequence databases, Methods Enzymol., 1996, vol. 266, pp. 554–571. doi 10.1016/S0076-6879(96)66035-2
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Novosibirsk, Russia
N. S. Safronova, M. P. Ponomarenko, G. V. Orlova, I. V. Chadaeva & Y. L. Orlov
Novosibirsk State University, Novosibirsk, Russia
N. S. Safronova, M. P. Ponomarenko & Y. L. Orlov
Sanger Center, Cambridge, UK
I. I. Abnizova

Authors

N. S. Safronova
View author publications
You can also search for this author in PubMed Google Scholar
M. P. Ponomarenko
View author publications
You can also search for this author in PubMed Google Scholar
I. I. Abnizova
View author publications
You can also search for this author in PubMed Google Scholar
G. V. Orlova
View author publications
You can also search for this author in PubMed Google Scholar
I. V. Chadaeva
View author publications
You can also search for this author in PubMed Google Scholar
Y. L. Orlov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. S. Safronova.

Additional information

Original Russian Text © N.S. Safronova, M.P. Ponomarenko, I.I. Abnizova, G.V. Orlova, I.V. Chadaeva, Y.L. Orlov, 2015, published in Vavilovskii Zhurnal Genetiki i Selektsii, 2015, Vol. 19, No. 6, pp. 668–674.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Safronova, N.S., Ponomarenko, M.P., Abnizova, I.I. et al. Flanking monomer repeats determine decreased context complexity of single nucleotide polymorphism sites in the human genome. Russ J Genet Appl Res 6, 809–815 (2016). https://doi.org/10.1134/S2079059716070121

Download citation

Received: 17 September 2015
Accepted: 27 October 2015
Published: 18 December 2016
Issue Date: December 2016
DOI: https://doi.org/10.1134/S2079059716070121

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Flanking monomer repeats determine decreased context complexity of single nucleotide polymorphism sites in the human genome

Abstract

Access this article

Similar content being viewed by others

Bioinformatics tools for the sequence complexity estimates

Selective intra-dinucleotide interactions and periodicities of bases separated by K sites: a new vision and tool for phylogeny analyses

The exceptional genomic word symmetry along DNA sequences

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Flanking monomer repeats determine decreased context complexity of single nucleotide polymorphism sites in the human genome

Abstract

Access this article

Similar content being viewed by others

Bioinformatics tools for the sequence complexity estimates

Selective intra-dinucleotide interactions and periodicities of bases separated by K sites: a new vision and tool for phylogeny analyses

The exceptional genomic word symmetry along DNA sequences

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation