Skip to main content

Extracting Approximate Patterns

Extended Abstract

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2676))

Abstract

In a sequence, approximate patterns are exponential in number. In this paper, we present a new notion of basis for the patterns with don’t cares occurring in a given text (sequence). The primitive patterns are of interest since their number is lower than previous known definitions (and in a case, sub-linear in the size of the text), and these patterns can be used to extract all the patterns of a text.

We present an incremental algorithm that computes the primitive patterns occurring at least q times in a text of length n, given the N primitive patterns occurring at least q−1 times, in time O(|Σ|Nn 2log2 n log log n). In the particular case where q = 2, the complexity in time is only O(|Σ|n 2 log2 n log log n). We also give an algorithm that decides if a given pattern is primitive in a given text.

Supported by the Ministry of Research (France) with the “Bio-ingénierie 2000” program, and the CIFRE convention, in a collaboration between ExonHit Therapeutics and ABISS — http://johann.jalix.org/research

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Apostolico. Pattern discovery and the algorithmics of surprise. In P. Frasconi and R. Shamir, editors, Proceedings of the NATO ASI on Artificial Intelligence and Heuristic Methods for Bioinformatics, October 2001.

    Google Scholar 

  2. A. Apostolico and L. Parida. Compression and the wheel of fortune. In Proceedings of Data Compression Conference (DCC), Snowbird, Utah, March 2003.

    Google Scholar 

  3. M. Crochemore, C. Hancart, and T. Lecroq. Algorithmique du Texte. Vuibert, 2001.

    Google Scholar 

  4. M. Crochemore and M.-F. Sagot. Motifs in sequences: localization and extraction. In A. Konopka and al., editors, Handbook of Computational Chemistry. Marcel Dekker, Inc, 2001.

    Google Scholar 

  5. M. J. Fischer and M. S. Paterson. String matching and other products. SIAM-AMS proceedings, pages 113–125, 1974.

    Google Scholar 

  6. I. Jonassen, J. Collins, and D. Higgins. Finding flexible Patterns in unaligned protein sequences. Protein Science, pages 1587–1595, 1995.

    Google Scholar 

  7. C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald, and J. Wootton. Science, volume 262, page 208. 1993.

    Article  Google Scholar 

  8. L. Marsan and M.-F. Sagot. Extracting structured motifs using a suffix tree — Algorithms and application to consensus identification. In S. Minoru and R. Shamir, editors, Proceedings of the 4th Annual International Conference on Computational Molecular Biology (RECOMB), Tokyo, Japan, 2000. ACM Press.

    Google Scholar 

  9. B. Morgenstern, A. Dress, and T. Werner. Multiple DNA and protein sequence alignment based on segment-to-segment comparison. In Proceedings of the National Academy of Sciences USA, pages 1209–12103, 1996.

    Google Scholar 

  10. L. Parida, I. Rigoutsos, A. Floratos, D. Platt, and Y. Gao. Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an efficient polynomial time algorithm. In Proceedings of the 11th Symposium on Discrete Algorithms, pages 297–308, 2000.

    Google Scholar 

  11. J. Pelfrêne. Indexation de motifs approches. Rapport de DÉA, September 2000.

    Google Scholar 

  12. J. Pelfrêne, S. Abdeddaïm, and J. Alexandre. Un algorithme d’indexation de motifs approchés (poster and short talk). In Journées Ouvertes Biologie Informatique Mathématiques, Saint-Malo, pages 263–264, June 2002.

    Google Scholar 

  13. N. Pisanti, M. Crochemore, R. Grossi, and M.-F. Sagot. Bases of motifs for generating repeated patterns with don’t cares. Technical report, Università di Pisa, February 2003.

    Google Scholar 

  14. A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computer (Arch. Elektron. Rechnen), 7:281–292, 1971.

    MATH  Google Scholar 

  15. G. Schuler, S. Altschul, and D. Lipman. Proteins: Structure, Function, and Genetics, volume 9, pages 180–190. 1991.

    Article  Google Scholar 

  16. J. Wang, B. Shapiro, and D. Shasha. Pattern Discovery in Biomolecular Data. Oxford University Press, 1999.

    Google Scholar 

  17. M. Waterman and R. Jones. Methods in enzymology, page 221. Academic Press, London, 1990. pp. 348–360, 2003.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pelfrêne, J., Abdeddaïm, S., Alexandre, J. (2003). Extracting Approximate Patterns. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_24

Download citation

  • DOI: https://doi.org/10.1007/3-540-44888-8_24

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40311-1

  • Online ISBN: 978-3-540-44888-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics