Skip to main content

Spectral–Statistical Approach for Revealing Latent Regular Structures in DNA Sequence

  • Protocol
  • First Online:
Data Mining Techniques for the Life Sciences

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1415))

  • 4184 Accesses

Abstract

Methods of the spectral–statistical approach (2S-approach) for revealing latent periodicity in DNA sequences are described. The results of data analysis in the HeteroGenome database which collects the sequences similar to approximate tandem repeats in the genomes of model organisms are adduced. In consequence of further developing of the spectral–statistical approach, the techniques for recognizing latent profile periodicity are considered. These techniques are basing on extension of the notion of approximate tandem repeat. Examples of correlation of latent profile periodicity revealed in the CDSs with structural–functional properties in the proteins are given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Sokol D, Benson G, Tojeira J (2007) Tandem repeats over the edit distance. Bioinformatics 23:e30–e35

    Article  CAS  PubMed  Google Scholar 

  3. Issac B, Singh H, Kaur H, Raghava GPS (2002) Locating probable genes using Fourier transform approach. Bioinformatics 18:196–197

    Article  CAS  PubMed  Google Scholar 

  4. Sharma D, Issac B, Raghava GPS, Ramaswamy R (2004) Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation. Bioinformatics 20:1405–1412

    Article  CAS  PubMed  Google Scholar 

  5. Paar V, Pavin N, Basar I, Rosandić M, Gluncić M, Paar N (2008) Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats. BMC Bioinformatics 9:466

    Article  PubMed  PubMed Central  Google Scholar 

  6. Wang L, Stein LD (2010) Localizing triplet periodicity in DNA and cDNA sequences. BMC Bioinformatics 11:550

    Article  PubMed  PubMed Central  Google Scholar 

  7. Nunes MC, Wanner EF, Weber G (2011) Origin of multiple periodicities in the Fourier power spectra of the Plasmodium falciparum genome. BMC Genomics 12(Suppl 4):S4

    Article  PubMed  PubMed Central  Google Scholar 

  8. Stoffer DS, Tyler DE, Wendt DA (2000) The spectral envelope and its applications. Stat Sci 15:224–253

    Article  Google Scholar 

  9. Korotkov EV, Korotkova MA, Kudryashov NA (2003) Information decomposition method for analysis of symbolical sequences. Phys Lett A 312:198–210

    Article  CAS  Google Scholar 

  10. Kumar L, Futschik M, Herzel H (2006) DNA motifs and sequence periodicities. In Silico Biol 6:71–78

    CAS  PubMed  Google Scholar 

  11. Nair AS, Mahalakshmi T (2006) Are categorical periodograms and indicator sequences of genomes spectrally equivalent? In Silico Biol 6:215–222

    CAS  PubMed  Google Scholar 

  12. Chaley M, Kutyrkin V (2008) Model of perfect tandem repeat with random pattern and empirical homogeneity testing poly-criteria for latent periodicity revelation in biological sequences. Math Biosci 211:186–204

    Article  CAS  PubMed  Google Scholar 

  13. Salih F, Salih B, Trifonov EN (2008) Sequence structure of hidden 10.4-base repeat in the nucleosomes of C. elegans. J Biomol Struct Dyn 26:273–281

    Article  CAS  PubMed  Google Scholar 

  14. Epps J (2009) A hybrid technique for the periodicity characterization of genomic sequence data. EURASIP J Bioinform Syst Biol 2009:924601

    Article  PubMed Central  Google Scholar 

  15. Glunčić M, Paar V (2013) Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm. Nucleic Acids Res 41(1):e17

    Article  PubMed  PubMed Central  Google Scholar 

  16. Gelfand Y, Rodriguez A, Benson G (2006) TRDB – The Tandem Repeats Database. Nucleic Acids Res 00(Database issue):D1–D8

    Google Scholar 

  17. Chaley MB, Kutyrkin VA, Tuylbasheva GE, Teplukhina EI, Nazipova NN (2013) Investigation of latent periodicity phenomenon in the genomes of eukaryotic organisms. Math Biol Bioinform 8:480–501

    Article  Google Scholar 

  18. Chaley M, Kutyrkin V, Tulbasheva G, Teplukhina E, Nazipova N (2014) HeteroGenome: database of genome periodicity. Database article ID bau40

    Google Scholar 

  19. Epps J, Ying H, Huttley GA (2011) Statistical methods for detecting periodic fragments in DNA sequence data. Biol Direct 6:21

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Chaley MB, Kutyrkin VA (2010) Structure of proteins and latent periodicity in their genes. Moscow Univ Biol Sci Bull 65:133–135

    Article  Google Scholar 

  21. Chaley M, Kutyrkin V (2011) Profile-statistical periodicity of DNA coding regions. DNA Res 18:353–362

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Kutyrkin VA, Chaley MB (2014) Spectral-statistical approach to latent profile periodicity recognition in DNA sequences. Math Biol Bioinform 9:33–62

    Article  Google Scholar 

  23. Fields S, Johnston M (2005) Cell biology. Whither model organism research? Science 307:1885–1886

    Article  CAS  PubMed  Google Scholar 

  24. Benson DA, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2015) GenBank. Nucleic Acids Res 43(Database issue):D30–D35

    Article  PubMed  PubMed Central  Google Scholar 

  25. Boeva V, Regnier M, Papatsenko D, Makeev V (2006) Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression. Bioinformatics 22:676–684

    Article  CAS  PubMed  Google Scholar 

  26. Grover A, Aishwarya V, Sharma PC (2012) Searching microsatellites in DNA sequences: approaches used and tools developed. Physiol Mol Biol Plants 18:11–19

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Gelfand Y, Hernandez Y, Loving J, Benson G (2014) VNTRseek – a computational tool to detect tandem repeat variants in high-throughput sequencing data. Nucleic Acids Res 42:8884–8894

    Article  PubMed  PubMed Central  Google Scholar 

  28. Anisimova M, Pečerska J, Schaper E (2015) Statistical approaches to detecting and analyzing tandem repeats in genomic sequences. Front Bioeng Biotechnol 3:31

    Article  PubMed  PubMed Central  Google Scholar 

  29. Cramer H (1999) Mathematical methods of statistics. Princeton University Press, Princeton, NJ

    Google Scholar 

  30. International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921

    Article  Google Scholar 

  31. Dieringer D, Schlötterer C (2003) Two distinct modes of microsatellite mutation processes: evidence from the complete genomic sequences of nine species. Genome Res 13:2242–2251

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Ellegren H (2004) Microsatellites: simple sequences with complex evolution. Nat Rev Genet 5:435–445

    Article  CAS  PubMed  Google Scholar 

  33. Richard GF, Kerrest A, Dujon B (2008) Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol Mol Biol Rev 72:686–727

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M (2012) KEGG for integration and interpretation of large-scale molecular datasets. Nucleic Acids Res 40(Database issue):D109–D114

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Chaley M, Kutyrkin V (2016) Stochastic model of homogeneous coding and latent periodicity in DNA sequences. J Theor Biol 390:106–116

    Google Scholar 

  36. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR et al (2014) Pfam: the protein families database. Nucleic Acids Res 42(Database issue):D222–D230

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Shepelev V, Fedorov A (2006) Advances in the Exon-Intron Database. Brief Bioinform 7:178–185

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maria Chaley .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Chaley, M., Kutyrkin, V. (2016). Spectral–Statistical Approach for Revealing Latent Regular Structures in DNA Sequence. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 1415. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3572-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3572-7_16

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3570-3

  • Online ISBN: 978-1-4939-3572-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics