A New Method for Finding Approximate Repetitions in DNA Sequences

  • Di Wang
  • Guoren Wang
  • Qingquan Wu
  • Baichen Chen
  • Yi Zhao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4016)


Searching for approximate repetitions in a DNA sequence has been an important topic in gene analysis. One of the problems in the study is that because of the varying lengths of patterns, the similarity between patterns cannot be judged accurately if we use only the concept of ED ( Edit Distance ). In this paper we shall make effort to define a new function to compute similarity, which considers both the difference and sameness between patterns at the same time. Seeing the computational complexity, we shall also propose two new filter methods based on frequency distance and Pearson correlation, with which we can sort out candidate set of approximate repetitions efficiently. We use SUA instead of sliding window to get the fragments in a DNA sequence, so that the patterns of an approximate repetition have no limitation on length. The results show that with our technique we are able to find a bigger number of approximate repetitions than that of those found with tandem repeat finder.


Tandem Repeat Edit Distance Edit Operation Frequency Vector Tandem Repeat Finder 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    David, W.M.: Bioinformatics Sequence and Genome Analysis. Cold Spring Harbor Laborary Press (2001)Google Scholar
  2. 2.
    International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409(15), 860–921 (2001)Google Scholar
  3. 3.
    IBeleza, S., Alves, C., Gonzalez-Neira, A., Lareu, M., Amorim, A., Carracedo, A., Gusmao, L.: Extending STR markers in Y chromosome haplotypes. Int.J.Legal Med. 117(1), 27–33 (2003)Google Scholar
  4. 4.
    Young, D.R., Tun, Z., Honda, K., Matoba, R.: Identifying sex chromosome abnormalities in forensic DNA testing using amelogenin and sex chromosome short tandem repeats. J.Forensic Sci. 46(2), 346–348 (2001)Google Scholar
  5. 5.
    Moore, C.J., Daly, E.M., Tassone, F., et al.: The effect of pre-mutation of X chromosome CGG trinucleotide repeats on brain anatomy. Brain (October 2004)Google Scholar
  6. 6.
    Benson, G.: An algorithm for finding tandem repeats of unspecified pattern size. In: RECOMB 1998, pp. 20–29. ACM Press, New York (1998)CrossRefGoogle Scholar
  7. 7.
    Landau, G.M., Schmidt, J.P.: An algorithm for approximate tandem repeats. In: Proc. Of the 4th Annual Symposium on Combinatorial Pattern Matching, Italy, vol. 684, pp. 120–133 (1993)Google Scholar
  8. 8.
    Kurtz, S., Choudhuri, J.V., Ohlebusch, E., Schleiermacher, C., Stoye, J., Giegerich, R.: REPuter: the manifold applications of repeat analysis on a genomic scale. Nucl. Acids Res. 29(22), 4633–4642 (2001)CrossRefGoogle Scholar
  9. 9.
    Benson, G., Waterman, M.: A method for fast database search for all k-nucleotide repeats. Nucl. Acids Res. 22, 4828–4836 (1994)CrossRefGoogle Scholar
  10. 10.
    Benson, G.: Tandem repeats finder: a program t analyze dna. Nucl. Acids Res. 27(2), 573–580 (1998)CrossRefGoogle Scholar
  11. 11.
    Wexler, Y., Yakhini, Z., Kashi, Y., Geiger, D.: Finding approximate tandem repeats in genomic sequences. In: RECOMB 2004, pp. 223–232. ACM Press, New York (2004)CrossRefGoogle Scholar
  12. 12.
    Gusfield, D.: Algorithms on string, trees and sequences. In: Computer science and computational biology, Cambridge University Press, Cambridge (1997)Google Scholar
  13. 13.
    Kahveci, T., Singh, A.K.: An efficient index strction of string databases. In: VLDB 2001, pp. 351–360 (2001)Google Scholar
  14. 14.
    Wang, D., Wang, G., Wu, Q., Chen, B.: Finding LPRs in DNA sequence based on a new index SUA. In: BIBE 2005, pp. 281–284. IEEE Computer Science, Los Alamitos (2005)Google Scholar
  15. 15.
    Wang, D., Wang, G., Chen, B., Wu, Q., Wang, B., Han, D.: A new lightweight index SUA for biological sequence anlysis. J. Huazhong Univ. of Sci. & Tech. 33(12), 207–210 (2005)Google Scholar
  16. 16.
    Wang, D., Wang, G., Wu, Q., Chen, B.: Finding approximate repetitions in DNA sequence based on SUA. Technology Report (2005),

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Di Wang
    • 1
  • Guoren Wang
    • 1
  • Qingquan Wu
    • 1
    • 2
  • Baichen Chen
    • 1
  • Yi Zhao
    • 1
  1. 1.College of Information Science & EngineeringNortheastern UniversityShenyangChina
  2. 2.Shanghai Baosight Ltd.ShanghaiChina

Personalised recommendations