Skip to main content
Log in

Statistical methods for detecting latent periodicity patterns in biological sequences: The case of small-size samples

  • Application Problems
  • Published:
Pattern Recognition and Image Analysis Aims and scope Submit manuscript

Abstract

An original spectral-statistical approach for detecting latent periodicity in biological sequences is proposed. This approach can be applied under conditions of limited statistical sample. It allows one to avoid redundancy and instability when identifying the latent periodicity structure. The optimality of the periodicity-pattern-size estimates obtained for approximate tandem repeats on the basis of the spectral-statistical approach is demonstrated in practical examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. G. Benson, “Tandem Repeats Finder: A Program to Analyze DNA Sequences,” Nucl. Acids Res. 27, 573 (1999).

    Article  Google Scholar 

  2. G. Benson, “A New Distance Measure for Comparing Sequence Profiles Based on Path Length along an Entropy Surface,” Bioinformatics 18, S44 (2002).

    Google Scholar 

  3. R. Kolpakov, G. Bana, and G. Kucherov, “MREPS: Efficient and Flexible Detection of Tandem Repeats in DNA,” Nucl. Acids Res. 31, 3672 (2003).

    Article  Google Scholar 

  4. V. Boeva, M. Regnier, D. Papatsenko, et al., “Short Fuzzy Tandem Repeats in Genomic Sequences, Identification, and Possible Role in Regulation of Gene Expression,” Bioinformatics 22, 676 (2004).

    Article  Google Scholar 

  5. F. Denoeud and G. Vergnaud, “Identification of Polymorphic Tandem Repeats by Direct Comparison of Genome Sequence from Different Bacterial Strains: A Web-Based Resource,” BMC Bioinformatics 5, 4 (2004).

    Article  Google Scholar 

  6. P. Le Fleche, Y. Hauck, L. Onteniente, et al., “A Tandem Repeats Database for Bacterial Genomes: Application to the Genotyping of Yersinia Pestis and Bacillus Anthracis,” BMC Microbiol 1, 2 (2001).

    Article  Google Scholar 

  7. T. Body, A. M. Patch, and S. J. Aves, “TRbase: A Database Relating Tandem Repeats to Disease Genes for the Human Genome,” Bioinformatics 21, 811 (2005).

    Google Scholar 

  8. P. I. Missirlis, C. L. Mead, S. L. Butland, et al., “Satellog: A Database for the Identification and Prioritization of Satellite Repeats in Disease Association Studies,” BMC Bioinformatics 6, 145 (2005).

    Article  Google Scholar 

  9. P. Siwach, S. D. Pophaly, and S. Ganesh, “Genomic and Evolutionary Insights into Genes Encoding Proteins with Single Amino Acid Repeats,” Mol. Biol. Evol. 23, 1357 (2006).

    Article  Google Scholar 

  10. M. V. Katti, R. Sami-Subbu, P. K. Ranjekar, et al., “Amino Acid Repeat Patterns in Protein Sequences: Their Diversity and Structural-Functional Implications,” Protein Sci. 9, 1203–1209 (2000).

    Article  Google Scholar 

  11. P. Tompa, “Intrinsically Unstructured Proteins Evolve by Repeat Expansion,” Bioessays 25, 847 (2003).

    Article  Google Scholar 

  12. M. K. Kalita, G. Ramasamy, S. Duraisamy, et al., “ProtRepeatsDB: A Database of Amino Acid Repeats in Genomes,” BMC Bioinformatics 7, 336 (2006).

    Article  Google Scholar 

  13. V. P. Turutina, A. A. Laskin, N. A. Kudryashov, et al., “Identification of Amino Acid Latent Periodicity within 94 Protein Families,” J. Comput. Biol 13, 946 (2006).

    Article  MathSciNet  Google Scholar 

  14. A. T. Castelo, W. Martins, and G. R. Gao, “TROLL—Tandem Repeat Occurrence Locator,” Bioinformatics 18, 634 (2002).

    Article  Google Scholar 

  15. A. M. Hauth and D. A. Joseph, “Beyond Tandem Repeats: Complex Pattern Structures and Distant Regions of Similarity,” Bioinformatics 18, S31 (2002).

    Google Scholar 

  16. M. J. Shulman, C. M. Steinberg, and N. Westmoreland, “The Coding Function of Nucleotide Sequences Can Be Discerned by Statistical Analysis,” J. Theor. Biol. 88,409 (1981).

    Article  Google Scholar 

  17. E. V. Korotkov, M. A. Korotkova, and N. A. Kudryashov, “Information Decomposition Method to Analyze Symbolical Sequences,” Phys. Lett. A 312, 198 (2003).

    Article  MATH  MathSciNet  Google Scholar 

  18. M. A. Korotkova, E. V. Korotkov, and V. M. Rudenko, “Latent Periodicity in Protein Sequences,” J. Mol. Model. 5, 103 (1999).

    Article  Google Scholar 

  19. D. Gatherer and N. McEwan, “Analysis of Sequence Periodicity in E. coli Proteins,” J. Mol. Evol. 57, 149–158 (2003).

    Article  Google Scholar 

  20. A. Shelenkov, K. Skryabin, and E. Korotkov, “Search and Classification of Potential Minisatellite Sequences from Bacterial Genomes,” DNA Res. 13, 89–102 (2006).

    Article  Google Scholar 

  21. M. B. Chaley, E. V. Korotkov, and K. G. Skryabin, “Method Revealing Latent Periodicity of the Nucleotide Sequences Modified for a Case of Small Samples,” DNA Res. 6, 153 (1999).

    Article  Google Scholar 

  22. B. D. Silverman and R. Linsker, “A Measure of DNA Periodicity,” J. Theor. Biol. 118, 295 (1986).

    Article  Google Scholar 

  23. D. Sharma, B. Issac, G. P. Paghava, et al., “Spectral Repeat Finder (SRF): Identification of Repetitive Sequences Using Fourier Transformation,” Bioinformatics 20, 1405 (2004).

    Article  Google Scholar 

  24. S. L. Marple, Digital Spectral Analysis and Applications (Prentice-Hall, Baltimore, 1987).

    Google Scholar 

  25. M. Altaiski, O. Mornev, and R. Polozov, “Wavelet Analysis of DNA Sequences,” Genet. Anal. 12, 165 (1996).

    Google Scholar 

  26. G. Dodin, P. Vandergheynst, P. Levoir, et al., “Fourier and Wavelet Transform Analysis, a Tool for Visualizing Regular Patterns in DNA Sequences,” J. Theor. Biol. 206, 323 (2000).

    Article  Google Scholar 

  27. G. Landau, J. Schmidt, and D. Sokol, “An Algorithm for Approximate Tandem Repeats,” J. Comp. Biol. 8, 1 (2001).

    Article  Google Scholar 

  28. W. Li, “The Study of Correlation Structures of DNA Sequences: A Critical Review,” Computers Chem. 21, 257 (1997).

    Article  Google Scholar 

  29. H. Cramer, Mathematical Methods of Statistics (Princeton University Press, Princeton, 1999).

    MATH  Google Scholar 

  30. M. Chaley, V. Kutyrkin, “Model of Perfect Tandem Repeat with Random Pattern and Empirical Homogeneity Testing Poly-Criteria for Latent Periodicity Revelation in Biological Sequences.” Math. Biosci. 211, 186 (2008).

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. B. Chaley.

Additional information

Maria Borisovna Chaley was born in 1963 and graduated from the Moscow Institute of Physics and Technology in 1988 (M.Sc.). She received her PhD in biophysics in 1993 and became a docent in bioinformatics in 2003. At the present time, she is a senior research fellow at the Institute of Mathematical Biology Problems of the Russian Academy of Sciences. Her research interests include bioinformatics, genetic text analysis, and molecular evolution. She is the author of over 40 research publications, including 14 journal articles.

Nafisa Nailovna Nazipova was born in 1960 and graduated from the Department of Computational Mathematics and Cybernetics of Lenin Kazan State University in 1982 (MSc.). She received her PhD in physics and mathematics (mathematic modeling, numerical methods, and program complexes) in 2002. At the present time, she is the head of the bioinformatics laboratory at the Institute of Mathematical Biology Problems of the Russian Academy of Sciences. Her research interests include bioinformatics and the structural and functional organization of genetic sequences. She is the author of over 35 research publications, including 9 papers in refereed journals and 2 book chapters.

Vladimir Andreyevich Kutyrkin was born in 1952 and graduated from the Department of Mechanics and Mathematics of Lomonosov Moscow State University in 1974 (MSc.). He received his PhD in physics and mathematics in 1995. At the present time, he is a docent of Bauman Moscow State Technical University. His research interests include applied mathematical statistics, computational and discrete mathematics, and bioinformatics. He is the author of over 20 research publications, including 11 journal papers.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chaley, M.B., Nazipova, N.N. & Kutyrkin, V.A. Statistical methods for detecting latent periodicity patterns in biological sequences: The case of small-size samples. Pattern Recognit. Image Anal. 19, 358–367 (2009). https://doi.org/10.1134/S1054661809020217

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1054661809020217

Keywords

Navigation