Skip to main content

Part of the book series: Computational Biology ((COBO,volume 23))

  • 1811 Accesses

Abstract

DNA or protein repeats are recurring subsequences in these molecules. These repeats may be adjacent to each other in which case they are called tandem repeats or they may be dispersed named as sequence motifs. Discovery of such subsequences has various implications such as locating genes as they are frequently found near genes, comparing sequences, or disease analysis as the number of repeats is elevated in certain diseases. Instead of searching for exact repeats, we may be interested in finding approximate repeats, as these are encountered more frequently in experiments than exact ones due to mutations in sequences and erroneous measurements. We may search for repeats in a single sequence or a set of sequences. The detected repeats in the latter case provide also the conserved structures in the set which can be used to infer phylogenetic relationships. Discovery of these structures can be performed by combinatorial and probabilistic algorithms as we describe. Graph-based methods involve building of a k-partite similarity graph among k input sequences and then searching for cliques in this graph. A clique found this way will have a vertex in each partition and represent a common motif in all sequences. The distributed algorithms for this purpose are scarce and we propose two new algorithms to detect repeating sequences which can be easily experimented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abouelhoda MI, Kurtz S, Ohlebusch E (2002) The enhanced suffix array and its applications to genome analysis. In: Proceedings of WABI 2002, LNCS, vol 2452, pp 449–463. Springer

    Google Scholar 

  2. Bailey TL, Elkan C (1995b) The value of prior knowledge in discovering motifs with MEME. In: Proceedings of thethird international conference on intelligent systems for molecular biology, pp 21–29. AAAI Press

    Google Scholar 

  3. Bailey TL, Elkan C (1995a) Unsupervised leaning of multiple motifs in biopolymers using EM. Mach Learn 21:51–80

    Google Scholar 

  4. Chun-Hsi H, Sanguthevar R (2003) Parallel pattern identification in biological sequences on clusters. IEEE Trans Nanobiosci 2(1):29–34

    Article  Google Scholar 

  5. Crochemore M (1981) An optimal algorithm for computing the repetitions in a word. Inf Process Lett 12(5):244–250

    Article  MathSciNet  MATH  Google Scholar 

  6. Eskin E, Pevzner PA (2002) Finding composite regulatory patterns in DNA sequences. Bioinformatics 18:354–363

    Article  Google Scholar 

  7. Floratos A, Rigoutsos I (1998) On the time complexity of the TEIRESIAS algorithm. In: Research report RC 21161 (94582), IBM T.J. Watson Research Center

    Google Scholar 

  8. Gatchel JR, Zoghbi HY (2005) Diseases of unstable repeat expansion: mechanisms and common principles. Nat Rev Genet 6:743–755

    Article  Google Scholar 

  9. Goldstein DB, Schlotterer C (1999) Microsatellites: evolution and applications, 1st edn. Oxford University Press, ISBN-10: 0198504071, ISBN-13: 978-0198504078

    Google Scholar 

  10. Grundy WN, Bailey TL, Elkan CP (1996) ParaMEME: a parallel implementation and a web interface for a DNA and protein motif discovery tool. Comput Appl Biosci 12(4):303–310

    Google Scholar 

  11. Ikebata H, Yoshida R (2015) Repulsive parallel MCMC algorithm for discovering diverse motifs from large sequence sets. Bioinformatics 1–8: doi:10.1093/bioinformatics/btv017

  12. Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262:208–214

    Article  Google Scholar 

  13. Lim KG, Kwoh CK, Hsu LY, Wirawan A (2012) Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Brief Bioinform. doi:10.1093/bib/bbs023

    Google Scholar 

  14. Marsan L, Sagot MF (2000) Algorithms for extracting structured motifs using a suffix tree with application to promoter and regulatory site consensus identification. J Comput Biol 7(3/4):345–360

    Article  Google Scholar 

  15. Matroud A (2013) Nested tandem repeat computation and analysis. Ph.D. Thesis, Massey University

    Google Scholar 

  16. Mejia YP, Olmos I, Gonzalez JA (2010) Structured motifs identification in DNA sequences. In: Proceedings of the twenty-third international florida artificial intelligence research society conference (FLAIRS 2010), pp 44–49

    Google Scholar 

  17. Modan K, Das MK, Dai H-K (2007) A survey of DNA motif finding algorithms. BMC Bioinform 8(Suppl 7):S21

    Article  Google Scholar 

  18. Mohantyr S, Sahu B, Acharya AK (2013) Parallel implementation of exact algorithm for planted (l,d) motif search. In: Proceedings of the international conference on advances in computer science, AETACS

    Google Scholar 

  19. Mourad E, Albert YZ (eds) (2011) Algorithms in computational molecular biology: techniques, approaches and applications. Wiley series in bioinformatics, Chap. 18

    Google Scholar 

  20. Mourad E, Albert YZ (eds) (2011) Algorithms in computational molecular biology: techniques, approaches and applications, pp 386–387. Wiley

    Google Scholar 

  21. Nicolae M, Rajasekaran S (2014) Efficient sequential and parallel algorithms for planted motif search. BMC Bioinform 15:34. doi:10.1186/1471-2105-15-34

    Article  Google Scholar 

  22. Pardalos PM, Rappe J, Resende MGC (1998) An exact parallel algorithm for the maximum clique problem. In: De Leone et al (eds) High performance algorithms and software in nonlinear optimization, vol 24. Kluwer, Dordrecht, pp 279–300

    Google Scholar 

  23. Parson W, Kirchebner R, Muhlmann R, Renner K, Kofler A, Schmidt S, Kofler R (2005) Cancer cell line identification by short tandem repeat profiling: power and limitations. FASEB J 19(3):434–436

    Google Scholar 

  24. Pelotti S, Ceccardi S, Alu M, Lugaresi F, Trane R, Falconi M, Bini C, Cicognani A (2008) Cancerous tissues in forensic genetic analysis. Genet Test 11(4):397–400

    Article  Google Scholar 

  25. Pevzner P, Sze S (2000) Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of the eighth international conference on intelligent systems on molecular biology. San Diego, CA, pp 269–278

    Google Scholar 

  26. Rajasekaran S, Balla S, Huang C-H, Thapar V, Gryk M, Maciejewski M, Schiller M (2005) High-performance exact algorithms for motif search. J Clin Monitor Comput 19:319–328

    Article  Google Scholar 

  27. Rajasekaran S, Balla S, Huang C-H (2005) Exact algorithms for planted motif problems. J Comput Biol 12(8):1117–1128

    Article  Google Scholar 

  28. Sagot MF (1998) Spelling approximate repeated or common motifs using a suffix tree. In: Proceedings of the theoretical informatics conference (Latin98), pp 111–127

    Google Scholar 

  29. Sahli M, Mansour E, Kalnis P (2014) ACME: Efficient parallel motif extraction from very long sequences. VLDB J 23:871–893

    Article  Google Scholar 

  30. Satya RV, Mukherjee A (2004) New algorithms for Finding monad patterns in DNA sequences. In: Proceedings of SPIRE 2004, LNCS, vol 3246, pp 273–285. Springer

    Google Scholar 

  31. Srinivas A (ed) (2005) Handbook of computational molecular biology. Computer and information science series, Chap. 5, December 21, 2005. Chapman & Hall/CRC, Boca Raton

    Google Scholar 

  32. Stoye J, Gusfield D (2002) Simple and flexible detection of contiguous repeats using a suffix tree. Theor Comput Sci 1–2:843–856

    Article  MathSciNet  MATH  Google Scholar 

  33. Thota S, Balla S, Rajasekaran S (2007) Algorithms for motif discovery based on edit distance. In: Technical report, BECAT/CSE-TR-07-3

    Google Scholar 

  34. Zambelli F, Pesole G, Pavesi G (2012) Motif discovery and transcription factor binding sites before and after the next-generation sequencing era. Brief Bioinform 14:225–237

    Article  Google Scholar 

  35. Zhang S, Li S, Niu M, Pham PT, Su Z (2011) MotifClick: prediction of cis-regulatory binding sites via merging cliques. BMC Bioinf 12:238

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Erciyes .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Erciyes, K. (2015). Sequence Repeats. In: Distributed and Sequential Algorithms for Bioinformatics. Computational Biology, vol 23. Springer, Cham. https://doi.org/10.1007/978-3-319-24966-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24966-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24964-3

  • Online ISBN: 978-3-319-24966-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics