Sequence Repeats

Erciyes, K.

doi:10.1007/978-3-319-24966-7_8

K. Erciyes²⁵

Part of the book series: Computational Biology ((COBO,volume 23))

1811 Accesses

Abstract

DNA or protein repeats are recurring subsequences in these molecules. These repeats may be adjacent to each other in which case they are called tandem repeats or they may be dispersed named as sequence motifs. Discovery of such subsequences has various implications such as locating genes as they are frequently found near genes, comparing sequences, or disease analysis as the number of repeats is elevated in certain diseases. Instead of searching for exact repeats, we may be interested in finding approximate repeats, as these are encountered more frequently in experiments than exact ones due to mutations in sequences and erroneous measurements. We may search for repeats in a single sequence or a set of sequences. The detected repeats in the latter case provide also the conserved structures in the set which can be used to infer phylogenetic relationships. Discovery of these structures can be performed by combinatorial and probabilistic algorithms as we describe. Graph-based methods involve building of a k-partite similarity graph among k input sequences and then searching for cliques in this graph. A clique found this way will have a vertex in each partition and represent a common motif in all sequences. The distributed algorithms for this purpose are scarce and we propose two new algorithms to detect repeating sequences which can be easily experimented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abouelhoda MI, Kurtz S, Ohlebusch E (2002) The enhanced suffix array and its applications to genome analysis. In: Proceedings of WABI 2002, LNCS, vol 2452, pp 449–463. Springer
Google Scholar
Bailey TL, Elkan C (1995b) The value of prior knowledge in discovering motifs with MEME. In: Proceedings of thethird international conference on intelligent systems for molecular biology, pp 21–29. AAAI Press
Google Scholar
Bailey TL, Elkan C (1995a) Unsupervised leaning of multiple motifs in biopolymers using EM. Mach Learn 21:51–80
Google Scholar
Chun-Hsi H, Sanguthevar R (2003) Parallel pattern identification in biological sequences on clusters. IEEE Trans Nanobiosci 2(1):29–34
Article Google Scholar
Crochemore M (1981) An optimal algorithm for computing the repetitions in a word. Inf Process Lett 12(5):244–250
Article MathSciNet MATH Google Scholar
Eskin E, Pevzner PA (2002) Finding composite regulatory patterns in DNA sequences. Bioinformatics 18:354–363
Article Google Scholar
Floratos A, Rigoutsos I (1998) On the time complexity of the TEIRESIAS algorithm. In: Research report RC 21161 (94582), IBM T.J. Watson Research Center
Google Scholar
Gatchel JR, Zoghbi HY (2005) Diseases of unstable repeat expansion: mechanisms and common principles. Nat Rev Genet 6:743–755
Article Google Scholar
Goldstein DB, Schlotterer C (1999) Microsatellites: evolution and applications, 1st edn. Oxford University Press, ISBN-10: 0198504071, ISBN-13: 978-0198504078
Google Scholar
Grundy WN, Bailey TL, Elkan CP (1996) ParaMEME: a parallel implementation and a web interface for a DNA and protein motif discovery tool. Comput Appl Biosci 12(4):303–310
Google Scholar
Ikebata H, Yoshida R (2015) Repulsive parallel MCMC algorithm for discovering diverse motifs from large sequence sets. Bioinformatics 1–8: doi:10.1093/bioinformatics/btv017
Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262:208–214
Article Google Scholar
Lim KG, Kwoh CK, Hsu LY, Wirawan A (2012) Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Brief Bioinform. doi:10.1093/bib/bbs023
Google Scholar
Marsan L, Sagot MF (2000) Algorithms for extracting structured motifs using a suffix tree with application to promoter and regulatory site consensus identification. J Comput Biol 7(3/4):345–360
Article Google Scholar
Matroud A (2013) Nested tandem repeat computation and analysis. Ph.D. Thesis, Massey University
Google Scholar
Mejia YP, Olmos I, Gonzalez JA (2010) Structured motifs identification in DNA sequences. In: Proceedings of the twenty-third international florida artificial intelligence research society conference (FLAIRS 2010), pp 44–49
Google Scholar
Modan K, Das MK, Dai H-K (2007) A survey of DNA motif finding algorithms. BMC Bioinform 8(Suppl 7):S21
Article Google Scholar
Mohantyr S, Sahu B, Acharya AK (2013) Parallel implementation of exact algorithm for planted (l,d) motif search. In: Proceedings of the international conference on advances in computer science, AETACS
Google Scholar
Mourad E, Albert YZ (eds) (2011) Algorithms in computational molecular biology: techniques, approaches and applications. Wiley series in bioinformatics, Chap. 18
Google Scholar
Mourad E, Albert YZ (eds) (2011) Algorithms in computational molecular biology: techniques, approaches and applications, pp 386–387. Wiley
Google Scholar
Nicolae M, Rajasekaran S (2014) Efficient sequential and parallel algorithms for planted motif search. BMC Bioinform 15:34. doi:10.1186/1471-2105-15-34
Article Google Scholar
Pardalos PM, Rappe J, Resende MGC (1998) An exact parallel algorithm for the maximum clique problem. In: De Leone et al (eds) High performance algorithms and software in nonlinear optimization, vol 24. Kluwer, Dordrecht, pp 279–300
Google Scholar
Parson W, Kirchebner R, Muhlmann R, Renner K, Kofler A, Schmidt S, Kofler R (2005) Cancer cell line identification by short tandem repeat profiling: power and limitations. FASEB J 19(3):434–436
Google Scholar
Pelotti S, Ceccardi S, Alu M, Lugaresi F, Trane R, Falconi M, Bini C, Cicognani A (2008) Cancerous tissues in forensic genetic analysis. Genet Test 11(4):397–400
Article Google Scholar
Pevzner P, Sze S (2000) Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of the eighth international conference on intelligent systems on molecular biology. San Diego, CA, pp 269–278
Google Scholar
Rajasekaran S, Balla S, Huang C-H, Thapar V, Gryk M, Maciejewski M, Schiller M (2005) High-performance exact algorithms for motif search. J Clin Monitor Comput 19:319–328
Article Google Scholar
Rajasekaran S, Balla S, Huang C-H (2005) Exact algorithms for planted motif problems. J Comput Biol 12(8):1117–1128
Article Google Scholar
Sagot MF (1998) Spelling approximate repeated or common motifs using a suffix tree. In: Proceedings of the theoretical informatics conference (Latin98), pp 111–127
Google Scholar
Sahli M, Mansour E, Kalnis P (2014) ACME: Efficient parallel motif extraction from very long sequences. VLDB J 23:871–893
Article Google Scholar
Satya RV, Mukherjee A (2004) New algorithms for Finding monad patterns in DNA sequences. In: Proceedings of SPIRE 2004, LNCS, vol 3246, pp 273–285. Springer
Google Scholar
Srinivas A (ed) (2005) Handbook of computational molecular biology. Computer and information science series, Chap. 5, December 21, 2005. Chapman & Hall/CRC, Boca Raton
Google Scholar
Stoye J, Gusfield D (2002) Simple and flexible detection of contiguous repeats using a suffix tree. Theor Comput Sci 1–2:843–856
Article MathSciNet MATH Google Scholar
Thota S, Balla S, Rajasekaran S (2007) Algorithms for motif discovery based on edit distance. In: Technical report, BECAT/CSE-TR-07-3
Google Scholar
Zambelli F, Pesole G, Pavesi G (2012) Motif discovery and transcription factor binding sites before and after the next-generation sequencing era. Brief Bioinform 14:225–237
Article Google Scholar
Zhang S, Li S, Niu M, Pham PT, Su Z (2011) MotifClick: prediction of cis-regulatory binding sites via merging cliques. BMC Bioinf 12:238
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering Department, Izmir University, Uckuyular, Izmir, Turkey
K. Erciyes

Authors

K. Erciyes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Erciyes .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Erciyes, K. (2015). Sequence Repeats. In: Distributed and Sequential Algorithms for Bioinformatics. Computational Biology, vol 23. Springer, Cham. https://doi.org/10.1007/978-3-319-24966-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-24966-7_8
Published: 01 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24964-3
Online ISBN: 978-3-319-24966-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics