RECOMB-CG 2010: Comparative Genomics pp 188-197 | Cite as

An Algorithm to Solve the Motif Alignment Problem for Approximate Nested Tandem Repeats

  • Atheer A. Matroud
  • Michael D. Hendy
  • Christopher P. Tuffley
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6398)

Abstract

An approximate nested tandem repeat (NTR) in a string T is a complex repetitive structure consisting of many approximate copies of two substrings x and X (“motifs”) interspersed with one another. NTRs have been found in real DNA sequences and are expected to have applications for evolutionary studies, both as a tool to understand concerted evolution, and as a potential marker in population studies.

In this paper we describe software tools developed for database searches for NTRs. After a first program NTRFinder identifies putative NTR motifs, a confirmation step requires the application of the alignment of the putative NTR against exact NTRs built from the putative template motifs x and X. In this paper we describe an algorithm to solve this alignment problem in O(|T|(| x| + | X|)) space and time. Our alignment algorithm is based on Fischetti et al.’s wrap-around dynamic programming.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Benson, G.: Tandem repeats finder: a program to analyze DNA sequences. Nucl. Acids Res. 27(2), 573–580 (1999)CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Domaniç, N.O., Preparata, F.P.: A novel approach to the detection of genomic approximate tandem repeats in the levenshtein metric. Journal of Computational Biology 14(7), 873–891 (2007)CrossRefPubMedGoogle Scholar
  3. 3.
    Fischetti, V.A., Landau, G.M., Sellers, P.H., Schmidt, J.P.: Identifying periodic occurrences of a template with applications to protein structure. Information Processing Letters 45, 11–18 (1993)CrossRefGoogle Scholar
  4. 4.
    Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)CrossRefGoogle Scholar
  5. 5.
    Hauth, A.M., Joseph, D.: Beyond tandem repeats: complex pattern structures and distant regions of similarity. In: ISMB, pp. 31–37 (2002)Google Scholar
  6. 6.
    Matroud, A.A., Hendy, M.D., Tuffley, C.P.: NTRFinder: An Algorithm to Find Nested Tandem Repeats (2010), http://awcmee.massey.ac.nz/pdf_files/AtheerNTR_Apr.pdf
  7. 7.
    Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33, 2001 (1999)Google Scholar
  8. 8.
    Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443–453 (1970)CrossRefPubMedGoogle Scholar
  9. 9.
    Newman, A., Cooper, J.: Xstream: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences. BMC Bioinformatics 8(1), 382 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)CrossRefPubMedGoogle Scholar
  11. 11.
    Wexler, Y., Yakhini, Z., Kashi, Y., Geiger, D.: Finding approximate tandem repeats in genomic sequences. Journal of Computational Biology 12(7), 928–942 (2005); PMID: 16201913CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Atheer A. Matroud
    • 1
    • 2
  • Michael D. Hendy
    • 1
    • 2
  • Christopher P. Tuffley
    • 2
  1. 1.Allan Wilson Centre for Molecular Ecology and EvolutionMassey UniversityPalmerston NorthNew Zealand
  2. 2.Institute of Fundamental SciencesMassey UniversityPalmerston NorthNew Zealand

Personalised recommendations