RECOMB-CG 2010: Comparative Genomics pp 188-197 | Cite as
An Algorithm to Solve the Motif Alignment Problem for Approximate Nested Tandem Repeats
Abstract
An approximate nested tandem repeat (NTR) in a string T is a complex repetitive structure consisting of many approximate copies of two substrings x and X (“motifs”) interspersed with one another. NTRs have been found in real DNA sequences and are expected to have applications for evolutionary studies, both as a tool to understand concerted evolution, and as a potential marker in population studies.
In this paper we describe software tools developed for database searches for NTRs. After a first program NTRFinder identifies putative NTR motifs, a confirmation step requires the application of the alignment of the putative NTR against exact NTRs built from the putative template motifs x and X. In this paper we describe an algorithm to solve this alignment problem in O(|T|(| x| + | X|)) space and time. Our alignment algorithm is based on Fischetti et al.’s wrap-around dynamic programming.
Preview
Unable to display preview. Download preview PDF.
References
- 1.Benson, G.: Tandem repeats finder: a program to analyze DNA sequences. Nucl. Acids Res. 27(2), 573–580 (1999)CrossRefPubMedPubMedCentralGoogle Scholar
- 2.Domaniç, N.O., Preparata, F.P.: A novel approach to the detection of genomic approximate tandem repeats in the levenshtein metric. Journal of Computational Biology 14(7), 873–891 (2007)CrossRefPubMedGoogle Scholar
- 3.Fischetti, V.A., Landau, G.M., Sellers, P.H., Schmidt, J.P.: Identifying periodic occurrences of a template with applications to protein structure. Information Processing Letters 45, 11–18 (1993)CrossRefGoogle Scholar
- 4.Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)CrossRefGoogle Scholar
- 5.Hauth, A.M., Joseph, D.: Beyond tandem repeats: complex pattern structures and distant regions of similarity. In: ISMB, pp. 31–37 (2002)Google Scholar
- 6.Matroud, A.A., Hendy, M.D., Tuffley, C.P.: NTRFinder: An Algorithm to Find Nested Tandem Repeats (2010), http://awcmee.massey.ac.nz/pdf_files/AtheerNTR_Apr.pdf
- 7.Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33, 2001 (1999)Google Scholar
- 8.Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443–453 (1970)CrossRefPubMedGoogle Scholar
- 9.Newman, A., Cooper, J.: Xstream: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences. BMC Bioinformatics 8(1), 382 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
- 10.Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)CrossRefPubMedGoogle Scholar
- 11.Wexler, Y., Yakhini, Z., Kashi, Y., Geiger, D.: Finding approximate tandem repeats in genomic sequences. Journal of Computational Biology 12(7), 928–942 (2005); PMID: 16201913CrossRefPubMedGoogle Scholar