Abstract
Methods of the spectral–statistical approach (2S-approach) for revealing latent periodicity in DNA sequences are described. The results of data analysis in the HeteroGenome database which collects the sequences similar to approximate tandem repeats in the genomes of model organisms are adduced. In consequence of further developing of the spectral–statistical approach, the techniques for recognizing latent profile periodicity are considered. These techniques are basing on extension of the notion of approximate tandem repeat. Examples of correlation of latent profile periodicity revealed in the CDSs with structural–functional properties in the proteins are given.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580
Sokol D, Benson G, Tojeira J (2007) Tandem repeats over the edit distance. Bioinformatics 23:e30–e35
Issac B, Singh H, Kaur H, Raghava GPS (2002) Locating probable genes using Fourier transform approach. Bioinformatics 18:196–197
Sharma D, Issac B, Raghava GPS, Ramaswamy R (2004) Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation. Bioinformatics 20:1405–1412
Paar V, Pavin N, Basar I, Rosandić M, Gluncić M, Paar N (2008) Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats. BMC Bioinformatics 9:466
Wang L, Stein LD (2010) Localizing triplet periodicity in DNA and cDNA sequences. BMC Bioinformatics 11:550
Nunes MC, Wanner EF, Weber G (2011) Origin of multiple periodicities in the Fourier power spectra of the Plasmodium falciparum genome. BMC Genomics 12(Suppl 4):S4
Stoffer DS, Tyler DE, Wendt DA (2000) The spectral envelope and its applications. Stat Sci 15:224–253
Korotkov EV, Korotkova MA, Kudryashov NA (2003) Information decomposition method for analysis of symbolical sequences. Phys Lett A 312:198–210
Kumar L, Futschik M, Herzel H (2006) DNA motifs and sequence periodicities. In Silico Biol 6:71–78
Nair AS, Mahalakshmi T (2006) Are categorical periodograms and indicator sequences of genomes spectrally equivalent? In Silico Biol 6:215–222
Chaley M, Kutyrkin V (2008) Model of perfect tandem repeat with random pattern and empirical homogeneity testing poly-criteria for latent periodicity revelation in biological sequences. Math Biosci 211:186–204
Salih F, Salih B, Trifonov EN (2008) Sequence structure of hidden 10.4-base repeat in the nucleosomes of C. elegans. J Biomol Struct Dyn 26:273–281
Epps J (2009) A hybrid technique for the periodicity characterization of genomic sequence data. EURASIP J Bioinform Syst Biol 2009:924601
Glunčić M, Paar V (2013) Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm. Nucleic Acids Res 41(1):e17
Gelfand Y, Rodriguez A, Benson G (2006) TRDB – The Tandem Repeats Database. Nucleic Acids Res 00(Database issue):D1–D8
Chaley MB, Kutyrkin VA, Tuylbasheva GE, Teplukhina EI, Nazipova NN (2013) Investigation of latent periodicity phenomenon in the genomes of eukaryotic organisms. Math Biol Bioinform 8:480–501
Chaley M, Kutyrkin V, Tulbasheva G, Teplukhina E, Nazipova N (2014) HeteroGenome: database of genome periodicity. Database article ID bau40
Epps J, Ying H, Huttley GA (2011) Statistical methods for detecting periodic fragments in DNA sequence data. Biol Direct 6:21
Chaley MB, Kutyrkin VA (2010) Structure of proteins and latent periodicity in their genes. Moscow Univ Biol Sci Bull 65:133–135
Chaley M, Kutyrkin V (2011) Profile-statistical periodicity of DNA coding regions. DNA Res 18:353–362
Kutyrkin VA, Chaley MB (2014) Spectral-statistical approach to latent profile periodicity recognition in DNA sequences. Math Biol Bioinform 9:33–62
Fields S, Johnston M (2005) Cell biology. Whither model organism research? Science 307:1885–1886
Benson DA, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2015) GenBank. Nucleic Acids Res 43(Database issue):D30–D35
Boeva V, Regnier M, Papatsenko D, Makeev V (2006) Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression. Bioinformatics 22:676–684
Grover A, Aishwarya V, Sharma PC (2012) Searching microsatellites in DNA sequences: approaches used and tools developed. Physiol Mol Biol Plants 18:11–19
Gelfand Y, Hernandez Y, Loving J, Benson G (2014) VNTRseek – a computational tool to detect tandem repeat variants in high-throughput sequencing data. Nucleic Acids Res 42:8884–8894
Anisimova M, Pečerska J, Schaper E (2015) Statistical approaches to detecting and analyzing tandem repeats in genomic sequences. Front Bioeng Biotechnol 3:31
Cramer H (1999) Mathematical methods of statistics. Princeton University Press, Princeton, NJ
International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921
Dieringer D, Schlötterer C (2003) Two distinct modes of microsatellite mutation processes: evidence from the complete genomic sequences of nine species. Genome Res 13:2242–2251
Ellegren H (2004) Microsatellites: simple sequences with complex evolution. Nat Rev Genet 5:435–445
Richard GF, Kerrest A, Dujon B (2008) Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol Mol Biol Rev 72:686–727
Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M (2012) KEGG for integration and interpretation of large-scale molecular datasets. Nucleic Acids Res 40(Database issue):D109–D114
Chaley M, Kutyrkin V (2016) Stochastic model of homogeneous coding and latent periodicity in DNA sequences. J Theor Biol 390:106–116
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR et al (2014) Pfam: the protein families database. Nucleic Acids Res 42(Database issue):D222–D230
Shepelev V, Fedorov A (2006) Advances in the Exon-Intron Database. Brief Bioinform 7:178–185
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media New York
About this protocol
Cite this protocol
Chaley, M., Kutyrkin, V. (2016). Spectral–Statistical Approach for Revealing Latent Regular Structures in DNA Sequence. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 1415. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3572-7_16
Download citation
DOI: https://doi.org/10.1007/978-1-4939-3572-7_16
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3570-3
Online ISBN: 978-1-4939-3572-7
eBook Packages: Springer Protocols