Skip to main content

Search of Regions with Periodicity Using Random Position Weight Matrices in the Genome of C. elegans

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2017)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10209))

Included in the following conference series:

  • 1801 Accesses

Abstract

A mathematical method was developed in this study to determine tandem repeats in a DNA sequence. A multiple alignment of periods was calculated by direct optimization of the position-weight matrix (PWM) without using pairwise alignments or searching for similarity between periods. Random PWMs were used to develop a new mathematical algorithm for periodicity search. The developed algorithm was applied to analyze the DNA sequences of C. elegans genome. 25360 regions having a periodicity with length of 2 to 50 bases were found. On the average, a periodicity of ~4000 nucleotides was found to be associated with each region. A significant portion of the revealed regions have periods consisting of 10 and 11 nucleotides, multiple to 10 nucleotides and periods in the vicinity of 35 nucleotides. Only ~30% of the periods found were discovered early. This study discussed the origin of periodicity with insertions and deletions.

This work was supported by Russian Science Foundation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Korotkov, E.V., Korotkova, M.A., Kudryashov, N.A.: Information decomposition method to analyze symbolical sequences. Phys. Lett. Sect. A Gen. At. Solid State Phys. 312, 198–210 (2003)

    MATH  MathSciNet  Google Scholar 

  2. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, ‎Cambridge (1998). doi:10.1017/CBO9780511790492

    Book  MATH  Google Scholar 

  3. Suvorova, Y.M., Korotkova, M.A., Korotkov, E.V.: Comparative analysis of periodicity search methods in DNA sequences. Comput. Biol. Chem. 53(PA), 43–48 (2014). doi:10.1016/j.compbiolchem.2014.08.008

    Article  Google Scholar 

  4. Tiwari, S., Ramachandran, S., Bhattacharya, A., Bhattacharya, S., Ramaswamy, R.: Prediction of probable genes by Fourier analysis of genomic sequences. Comput. Appl. Biosci. CABIOS 13, 263–270 (1997)

    Google Scholar 

  5. Lobzin, V.V., Chechetkin, V.R.: Order and correlations in genomic DNA sequences. The spectral approach. Uspekhi Fiz Nauk 170, 57 (2000)

    Article  MATH  Google Scholar 

  6. Benson, G.: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999)

    Article  Google Scholar 

  7. Parisi, V., De Fonzo, V., Aluffi-Pentini, F.: STRING: finding tandem repeats in DNA sequences. Bioinformatics 19, 1733–1738 (2003)

    Article  Google Scholar 

  8. Anisimova, M., Pečerska, J., Schaper, E.: Statistical approaches to detecting and analyzing tandem repeats in genomic sequences. Front. Bioeng. Biotechnol. 3, 31 (2015). doi:10.3389/fbioe.2015.00031

    Article  Google Scholar 

  9. Turutina, V.P., Laskin, A.A., Kudryashov, N.A., Skryabin, K.G., Korotkov, E.V.: Identification of amino acid latent periodicity within 94 protein families. J. Comput. Biol. 13, 946–964 (2006). doi:10.1089/cmb.2006.13.946

    Article  MathSciNet  Google Scholar 

  10. Kolpakov, R., Bana, G., Kucherov, G.: Mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res. 31, 3672–3678 (2003)

    Article  Google Scholar 

  11. Pellegrini, M., Renda, M.E., Vecchio, A.: TRStalker: an efficient heuristic for finding fuzzy tandem repeats. Bioinformatics 26, i358–i366 (2010). doi:10.1093/bioinformatics/btq209

    Article  Google Scholar 

  12. Wexler, Y., Yakhini, Z., Kashi, Y., Geiger, D.: Finding approximate tandem repeats in genomic sequences. J. Comput. Biol. 12, 928–942 (2005). doi:10.1089/cmb.2005.12.928

    Article  Google Scholar 

  13. Jorda, J., Kajava, A.V.: T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm. Bioinformatics 25, 2632–2638 (2009)

    Article  Google Scholar 

  14. Mudunuri, S.B., Kumar, P., Rao, A.A., Pallamsetty, S., Nagarajaram, H.A.: G-IMEx: a comprehensive software tool for detection of microsatellites from genome sequences. Bioinformation 5, 221–223 (2010)

    Article  Google Scholar 

  15. Mudunuri, S.B., Nagarajaram, H.A.: IMEx: imperfect microsatellite extractor. Bioinformatics 23, 1181–1187 (2007). doi:10.1093/bioinformatics/btm097

    Article  Google Scholar 

  16. Grissa, I., Vergnaud, G., Pourcel, C.: CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res. 35, W52–W57 (2007). doi:10.1093/nar/gkm360

    Article  Google Scholar 

  17. Boeva, V., Regnier, M., Papatsenko, D., Makeev, V.: Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression. Bioinformatics 22, 676–684 (2006). doi:10.1093/bioinformatics/btk032

    Article  Google Scholar 

  18. Lim, K.G., Kwoh, C.K., Hsu, L.Y., Wirawan, A.: Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Brief. Bioinform. 14, 67–81 (2013). doi:10.1093/bib/bbs023

    Article  Google Scholar 

  19. Moniruzzaman, M., Khatun, R., Yaakob, Z., Khan, M.S., Mintoo, A.A.: Development of microsatellites: a powerful genetic marker. Agriculturists 13, 152 (2016). doi:10.3329/agric.v13i1.26559

    Article  Google Scholar 

  20. Korotkov, E.V., Korotkova, M.A., Kudryashov, N.A.: The informational concept of searching for periodicity in symbol sequences. Mol. Biol. (Mosk) 37, 436–451 (2003)

    Article  Google Scholar 

  21. Shelenkov, A., Skryabin, K., Korotkov, E.: Search and classification of potential minisatellite sequences from bacterial genomes. DNA Res. 13, 89–102 (2006). doi:10.1093/dnares/dsl004

    Article  Google Scholar 

  22. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  23. Pugacheva, V.M., Korotkov, A.E., Korotkov, E.V.: Search of latent periodicity in amino acid sequences by means of genetic algorithm and dynamic programming. Stat. Appl. Genet. Mol. Biol. 15, 381–400 (2016)

    Article  MATH  MathSciNet  Google Scholar 

  24. Kullback, S.: Information Theory and Statistics. Dover publications, New York (1997)

    MATH  Google Scholar 

  25. Betley, J.N., Frith, M.C., Graber, J.H., Choo, S., Deshler, J.O.: A ubiquitous and conserved signal for RNA localization in chordates. Curr. Biol. 12, 1756–1761 (2002)

    Article  Google Scholar 

  26. Pokrzywa, R., Polanski, A.: BWtrs: a tool for searching for tandem repeats in DNA sequences based on the Burrows-Wheeler transform. Genomics 96, 316–321 (2010). doi:10.1016/j.ygeno.2010.08.001

    Article  Google Scholar 

  27. Herzel, H., Weiss, O., Trifonov, E.N.: 10–11 bp periodicities in complete genomes reflect protein structure and DNA folding. Bioinformatics 15, 187–193 (1999)

    Article  Google Scholar 

  28. Larsabal, E., Danchin, A.: Genomes are covered with ubiquitous 11 bp periodic patterns, the “class A flexible patterns”. BMC Bioinform. 6, 206 (2005). doi:10.1186/1471-2105-6-206

    Article  Google Scholar 

  29. Schieg, P., Herzel, H.: Periodicities of 10–11 bp as indicators of the supercoiled state of genomic DNA. J. Mol. Biol. 343, 891–901 (2004). doi:10.1016/j.jmb.2004.08.068

    Article  Google Scholar 

  30. Kumar, L., Futschik, M., Herzel, H.: DNA motifs and sequence periodicities. Silico. Biol. 6, 71–78 (2006)

    Google Scholar 

  31. Kadauke, S., Blobel, G.A.: Chromatin loops in gene regulation. Biochim. Biophys. Acta 1789, 17–25 (2009). doi:10.1016/j.bbagrm.2008.07.002

    Article  Google Scholar 

  32. Kantidze, O.L., Razin, S.V.: Chromatin loops, illegitimate recombination, and genome evolution. BioEssays 31, 278–286 (2009). doi:10.1002/bies.200800165

    Article  Google Scholar 

  33. Richard, G.-F., Kerrest, A., Dujon, B.: Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol. Mol. Biol. Rev. 72, 686–727 (2008). doi:10.1128/MMBR.00011-08

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Competitiveness Growth Program of the Federal Autonomous Educational Institution of Higher Professional Education National Research Nuclear University MEPhI (Moscow Engineering Physics Institute).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. V. Korotkov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Korotkov, E.V., Korotkova, M.A. (2017). Search of Regions with Periodicity Using Random Position Weight Matrices in the Genome of C. elegans . In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56154-7_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56153-0

  • Online ISBN: 978-3-319-56154-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics