Extracting Approximate Patterns

Extended Abstract
  • Johann Pelfrêne
  • Saïd Abdeddaïm
  • Joël Alexandre
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2676)

Abstract

In a sequence, approximate patterns are exponential in number. In this paper, we present a new notion of basis for the patterns with don’t cares occurring in a given text (sequence). The primitive patterns are of interest since their number is lower than previous known definitions (and in a case, sub-linear in the size of the text), and these patterns can be used to extract all the patterns of a text.

We present an incremental algorithm that computes the primitive patterns occurring at least q times in a text of length n, given the N primitive patterns occurring at least q−1 times, in time O(|Σ|Nn2log2n log log n). In the particular case where q = 2, the complexity in time is only O(|Σ|n2 log2n log log n). We also give an algorithm that decides if a given pattern is primitive in a given text.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. Apostolico. Pattern discovery and the algorithmics of surprise. In P. Frasconi and R. Shamir, editors, Proceedings of the NATO ASI on Artificial Intelligence and Heuristic Methods for Bioinformatics, October 2001.Google Scholar
  2. 2.
    A. Apostolico and L. Parida. Compression and the wheel of fortune. In Proceedings of Data Compression Conference (DCC), Snowbird, Utah, March 2003.Google Scholar
  3. 3.
    M. Crochemore, C. Hancart, and T. Lecroq. Algorithmique du Texte. Vuibert, 2001.Google Scholar
  4. 4.
    M. Crochemore and M.-F. Sagot. Motifs in sequences: localization and extraction. In A. Konopka and al., editors, Handbook of Computational Chemistry. Marcel Dekker, Inc, 2001.Google Scholar
  5. 5.
    M. J. Fischer and M. S. Paterson. String matching and other products. SIAM-AMS proceedings, pages 113–125, 1974.Google Scholar
  6. 6.
    I. Jonassen, J. Collins, and D. Higgins. Finding flexible Patterns in unaligned protein sequences. Protein Science, pages 1587–1595, 1995.Google Scholar
  7. 7.
    C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald, and J. Wootton. Science, volume 262, page 208. 1993.CrossRefGoogle Scholar
  8. 8.
    L. Marsan and M.-F. Sagot. Extracting structured motifs using a suffix tree — Algorithms and application to consensus identification. In S. Minoru and R. Shamir, editors, Proceedings of the 4th Annual International Conference on Computational Molecular Biology (RECOMB), Tokyo, Japan, 2000. ACM Press.Google Scholar
  9. 9.
    B. Morgenstern, A. Dress, and T. Werner. Multiple DNA and protein sequence alignment based on segment-to-segment comparison. In Proceedings of the National Academy of Sciences USA, pages 1209–12103, 1996.Google Scholar
  10. 10.
    L. Parida, I. Rigoutsos, A. Floratos, D. Platt, and Y. Gao. Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an efficient polynomial time algorithm. In Proceedings of the 11th Symposium on Discrete Algorithms, pages 297–308, 2000.Google Scholar
  11. 11.
    J. Pelfrêne. Indexation de motifs approches. Rapport de DÉA, September 2000.Google Scholar
  12. 12.
    J. Pelfrêne, S. Abdeddaïm, and J. Alexandre. Un algorithme d’indexation de motifs approchés (poster and short talk). In Journées Ouvertes Biologie Informatique Mathématiques, Saint-Malo, pages 263–264, June 2002.Google Scholar
  13. 13.
    N. Pisanti, M. Crochemore, R. Grossi, and M.-F. Sagot. Bases of motifs for generating repeated patterns with don’t cares. Technical report, Università di Pisa, February 2003.Google Scholar
  14. 14.
    A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computer (Arch. Elektron. Rechnen), 7:281–292, 1971.MATHGoogle Scholar
  15. 15.
    G. Schuler, S. Altschul, and D. Lipman. Proteins: Structure, Function, and Genetics, volume 9, pages 180–190. 1991.CrossRefGoogle Scholar
  16. 16.
    J. Wang, B. Shapiro, and D. Shasha. Pattern Discovery in Biomolecular Data. Oxford University Press, 1999.Google Scholar
  17. 17.
    M. Waterman and R. Jones. Methods in enzymology, page 221. Academic Press, London, 1990. pp. 348–360, 2003.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Johann Pelfrêne
    • 1
    • 3
  • Saïd Abdeddaïm
    • 2
  • Joël Alexandre
    • 3
  1. 1.ExonHit TherapeuticsParis
  2. 2.ABISS, LIFARUniversité de RouenMont Saint Aignan
  3. 3.ABISS, UMR CNRS 6037Université de RouenMont Saint Aignan

Personalised recommendations