Skip to main content

Extracting Common Motifs under the Levenshtein Measure: Theory and Experimentation

  • Conference paper
  • First Online:
Algorithms in Bioinformatics (WABI 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2452))

Included in the following conference series:

Abstract

Using our techniques for extracting approximate non-tandem repeats[1] on well constructed maximal models, we derive an algorithm to find common motifs of length P that occur in N sequences with at most D differences under the Edit distance metric. We compare the effectiveness of our algorithm with the more involved algorithm of Sagot[17] for Edit distance on some real sequences. Her method has not been implemented before for Edit distance but only for Hamming distance[12],[20]. Our resulting method turns out to be simpler and more efficient theoretically and also in practice for moderately large P and D.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E. F. Adebiyi, T. Jiang, and M. Kaufmann. An efficient algorithm for finding short approximate non-tandem repeats (Extended Abstract). Bioinformatics, 17(1):S5–S13, 2001.

    Google Scholar 

  2. E. F. Adebiyi. Pattern Discovery in Biology and Strings Sorting: Theory and Experimentation. Ph. D Thesis, 2002.

    Google Scholar 

  3. A. Blumer and A. Ehrenfeucht and others. Average size of suffix trees and DAWGS. Discrete Applied Mathematics, 24, 37–45, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  4. J.-M. Claverie and S. Audic. The Statistical significance of nucleotide position-weight matrix matches. Computer Applications in Biosciences 12(5), 431–439, 1996.

    Google Scholar 

  5. M. Crochemore and M.-F. Sagot. Motifs in sequences: localization and extraction. In Handbook of Computational Chemistry, Crabbe, Drew, Konopka, eds., Marcel Dekker, Inc., 2001. To appear.

    Google Scholar 

  6. D. Gusfield. Algorithms on strings, trees and sequences. Cambridge University Press, New York, 1997.

    MATH  Google Scholar 

  7. J. D. Helmann. Compilation and analysis of Bacillus Subtilis σ A -dependent promoter sequences: evidence for extended contact between RNA polymerase and up-stream promoter DNA., Nucleic Acids Research, 23(13): 2351–2360, 1995.

    Article  Google Scholar 

  8. L. C. K. Hui. Color set size problem with applications to string matching. In CPM Proceeding, vol. 644 of LNCS, 230–243, 1992.

    Google Scholar 

  9. S. Karlin, F. Ost, and B. E. Blaisdell. Patterns in DNA and amino acid sequences and their statistical significance. In M. S. Waterman, editor, Mathematical Methods for DNA Sequences, 133–158, 1989.

    Google Scholar 

  10. C. J. McInerny, J. F. Patridge, G. E. Mikesell, D. P. Creemer, and L. L. Breeden. A novel Mcm1-dependent element in the SWI4, CLN3, CDC6, CDC46, and CDC47 promoters activates M/G 1 -specific transcription. Genes and Development, 11: 1277–1288, 1997.

    Article  Google Scholar 

  11. E. Myers. A sub-linear algorithm for approximate keyword matching. Algorithmica 12, 4–5, 345–374, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  12. L. Marsan and M. F. Sagot. Extracting structured motifs using a suffix tree-algorithms and application to promoter consensus identification. RECOMB 2000.

    Google Scholar 

  13. P. Pevzner and S.-H. Sze. Combinatorial approaches to finding subtle signals in DNA sequences. ISMB, 269–278, 2000.

    Google Scholar 

  14. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes. In The Art of Scientific Computing, Cambridge University Press, Cambridge.

    Google Scholar 

  15. E. Rocke and M. Tompa. An algorithm for finding novel gaped motifs in DNA sequences. RECOMB, 228–233, 1998.

    Google Scholar 

  16. B. Schieber and U. Vishkin. On Finding Lowest Common Ancestors: Simplification and Parallelization. SIAM Journal on Computing, 17:1253–1262, 1988.

    Article  MATH  MathSciNet  Google Scholar 

  17. M.-F. Sagot. Spelling approximate repeated or common motifs using a suffix tree. LNCS 1380: 111–127, 1998.

    Google Scholar 

  18. J. F. Tomb et al. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature, 388, 539–547, 1997.

    Article  Google Scholar 

  19. E. Ukkonen. Approximate string matching over suffix trees. LNCS 684: 228–242, 1993.

    Google Scholar 

  20. A. Vanet, L. Marsan, A. Labigne and M.-F. Sagot. Inferring regulatory elements from a whole genome. an analysis of Helicobacter pylori σ 80 family of promoter signals. J. Mol. Biol., 297, 335–353, 2000.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Adebiyi, E.F., Kaufmann, M. (2002). Extracting Common Motifs under the Levenshtein Measure: Theory and Experimentation. In: Guigó, R., Gusfield, D. (eds) Algorithms in Bioinformatics. WABI 2002. Lecture Notes in Computer Science, vol 2452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45784-4_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-45784-4_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44211-0

  • Online ISBN: 978-3-540-45784-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics