Extracting Approximate Patterns

Pelfrêne, Johann; Abdeddaïm, Saïd; Alexandre, Joël

doi:10.1007/3-540-44888-8_24

Extracting Approximate Patterns

Extended Abstract

Johann Pelfrêne^7,9,
Saïd Abdeddaïm⁸ &
Joël Alexandre⁹

Conference paper
First Online: 01 January 2003

632 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2676))

Abstract

In a sequence, approximate patterns are exponential in number. In this paper, we present a new notion of basis for the patterns with don’t cares occurring in a given text (sequence). The primitive patterns are of interest since their number is lower than previous known definitions (and in a case, sub-linear in the size of the text), and these patterns can be used to extract all the patterns of a text.

We present an incremental algorithm that computes the primitive patterns occurring at least q times in a text of length n, given the N primitive patterns occurring at least q−1 times, in time O(|Σ|Nn ²log² n log log n). In the particular case where q = 2, the complexity in time is only O(|Σ|n ² log² n log log n). We also give an algorithm that decides if a given pattern is primitive in a given text.

Supported by the Ministry of Research (France) with the “Bio-ingénierie 2000” program, and the CIFRE convention, in a collaboration between ExonHit Therapeutics and ABISS — http://johann.jalix.org/research

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Apostolico. Pattern discovery and the algorithmics of surprise. In P. Frasconi and R. Shamir, editors, Proceedings of the NATO ASI on Artificial Intelligence and Heuristic Methods for Bioinformatics, October 2001.
Google Scholar
A. Apostolico and L. Parida. Compression and the wheel of fortune. In Proceedings of Data Compression Conference (DCC), Snowbird, Utah, March 2003.
Google Scholar
M. Crochemore, C. Hancart, and T. Lecroq. Algorithmique du Texte. Vuibert, 2001.
Google Scholar
M. Crochemore and M.-F. Sagot. Motifs in sequences: localization and extraction. In A. Konopka and al., editors, Handbook of Computational Chemistry. Marcel Dekker, Inc, 2001.
Google Scholar
M. J. Fischer and M. S. Paterson. String matching and other products. SIAM-AMS proceedings, pages 113–125, 1974.
Google Scholar
I. Jonassen, J. Collins, and D. Higgins. Finding flexible Patterns in unaligned protein sequences. Protein Science, pages 1587–1595, 1995.
Google Scholar
C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald, and J. Wootton. Science, volume 262, page 208. 1993.
Article Google Scholar
L. Marsan and M.-F. Sagot. Extracting structured motifs using a suffix tree — Algorithms and application to consensus identification. In S. Minoru and R. Shamir, editors, Proceedings of the 4th Annual International Conference on Computational Molecular Biology (RECOMB), Tokyo, Japan, 2000. ACM Press.
Google Scholar
B. Morgenstern, A. Dress, and T. Werner. Multiple DNA and protein sequence alignment based on segment-to-segment comparison. In Proceedings of the National Academy of Sciences USA, pages 1209–12103, 1996.
Google Scholar
L. Parida, I. Rigoutsos, A. Floratos, D. Platt, and Y. Gao. Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an efficient polynomial time algorithm. In Proceedings of the 11th Symposium on Discrete Algorithms, pages 297–308, 2000.
Google Scholar
J. Pelfrêne. Indexation de motifs approches. Rapport de DÉA, September 2000.
Google Scholar
J. Pelfrêne, S. Abdeddaïm, and J. Alexandre. Un algorithme d’indexation de motifs approchés (poster and short talk). In Journées Ouvertes Biologie Informatique Mathématiques, Saint-Malo, pages 263–264, June 2002.
Google Scholar
N. Pisanti, M. Crochemore, R. Grossi, and M.-F. Sagot. Bases of motifs for generating repeated patterns with don’t cares. Technical report, Università di Pisa, February 2003.
Google Scholar
A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computer (Arch. Elektron. Rechnen), 7:281–292, 1971.
MATH Google Scholar
G. Schuler, S. Altschul, and D. Lipman. Proteins: Structure, Function, and Genetics, volume 9, pages 180–190. 1991.
Article Google Scholar
J. Wang, B. Shapiro, and D. Shasha. Pattern Discovery in Biomolecular Data. Oxford University Press, 1999.
Google Scholar
M. Waterman and R. Jones. Methods in enzymology, page 221. Academic Press, London, 1990. pp. 348–360, 2003.
Google Scholar

Download references

Author information

Authors and Affiliations

ExonHit Therapeutics, 65, Boulevard Masséena, 75013, Paris
Johann Pelfrêne
ABISS, LIFAR, Université de Rouen, 76821, Mont Saint Aignan
Saïd Abdeddaïm
ABISS, UMR CNRS 6037, Université de Rouen, 76821, Mont Saint Aignan
Johann Pelfrêne & Joël Alexandre

Authors

Johann Pelfrêne
View author publications
You can also search for this author in PubMed Google Scholar
Saïd Abdeddaïm
View author publications
You can also search for this author in PubMed Google Scholar
Joël Alexandre
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Depto. de Ciencias de la Computación, Universidad de Chile, Blanco Encalada 2120, Santiago, 6511224, Chile
Ricardo Baeza-Yates
Escuela de Ciencias Físico-Matemáticas, Universidad Michoacana, Edificio “B”, ciudad universitaria, Morelia Michoacán, Mexico
Edgar Chávez
Université de Marne-la-Vallée, 77454, Marne-la-Vallée Cedex 2, France
Maxime Crochemore

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pelfrêne, J., Abdeddaïm, S., Alexandre, J. (2003). Extracting Approximate Patterns. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_24

Download citation

DOI: https://doi.org/10.1007/3-540-44888-8_24
Published: 27 May 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40311-1
Online ISBN: 978-3-540-44888-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics